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Courses 


COMETT  2  Project  on 
Chemometrics  and 
Qualimetrics 

The  *BE(r'has  awarded'~se^rnj}N 
projects  to  European,  chemo- 
metricians  m  its  COMETT 
program.  The  objective  of  the 
COMETT  program  is  to  organize 
mdustry  oriented  training- on  a 
transnational  level  m  advanced 
technological  subjects.  xThe  pro¬ 
gram  is  open  to  all/ 12  EEC 
countries  but  also  tp/the  EFTA 
countries  (Norwoy^Sweden,  Fin¬ 
land,  IcelandfAustria  and  Swit- 
zerland).'^Four  types  of  projects 
were  awarded,  namely? 

^Creation  of  a  network  for 
analyzing  training  needs,  or* 
ganizing  exchanges,  publicizing 
sources  and  learning  material, 
etc.  This  network  is  called 
Eurochemometrics^ 

^Exchange  of  students  and  staff. 
Such  an  exchange  must  at  the 
same  time  be  transnational  and 
involve  both  industry  and 
university  (for  example  a 
university  in  Belgium  can  send 
one  of  its  students  to  an  in¬ 
dustry  in  Switzerland), 

-  Short  course  on  methodyalida- 
tion.  This  projected  is  co¬ 
ordinated  by  Dr.  H.  Smit 
(Universiteit  van  Amsterdam, 
Laboratorium  voor  AsMytische 


Scheikunde,  Nieuwe  Ach- 
tergracht  166,  1018  WV 

Amsterdam,  The  Netherlands). 
-  Demonstration  (pilot)  project 
on  a  package  of  courses  and 
training  materials.  (Chemo 
V^nietrics  and  qualimetrics  for 
^  Jhe  chemical,  pharmaceutical 
N^and  agroalimentary  industries). 

Except  for  the  course  coor¬ 
dinated  by  Dr.  Snnt,  the  projects 
are  coordinated  by  the  author  of 
this  news  item. 

The  most  important  project  is, 
without  doubt,  the  pilot  project.  It 
proposes  courses  on  4  levels: 

;  -  Introductory  and  integration 
courses.  The  introduction  cour¬ 
ses  are  2  to  3  day  general  cour¬ 
ses,  meant  for  countries  where 
chemometrics  has  progressed 
to  a  lesser  extent.  Integration 
courses  are  those  which  com¬ 
prise  chemometrics  together 
with  more  familiar  subjects.  An 
example  of  such  a  course  is  that 
organized  by  Prof.  Ducauze  and 
Dr.  Feinberg  in  Pa  ‘is  (in 
French,  Institut  National  Agro- 
nomique,  Laboratoiro  de  Chi 
mic  Analytique,  rue  Claude 
Bernard  16,  75231  Paris  Cedex 
05,  France).  By  teaching  a 
course  in  which  chemometrics 
is  made  available  in  the  same 
program  as  instrumental  lab¬ 
oratory  methods,  it  aims  at  the 
integration  of  chemometrics  in 


the  more  general  knowledge  of 
analytical  chemical  method¬ 
ology. 

-  General  long  courses  These 
courses  are  similar  to  those  or¬ 
ganized  in  the  earlier,  less  am¬ 
bitious  COMETT  1  project.  The 
course  lasts  about  5  days,  is  or¬ 
ganized  by  different  countries 
m  turn  and  has  many  lecturers 
from  industry  and  university. 
Such  schools  have  been  or¬ 
ganized  earlier  in  Aix  en 
Provence,  Gargnano,  Tortosa, 
Bristol  and  Bruges.  The  next 
school  will  be  organized  in  or 
around  Nymegen.  (For  details, 
write  to  Dr.  L.  Buydens,  Labo- 
ratonum  voor  Analytische 
Chemie,  Katholieke  Univer¬ 
siteit  Nijmegen,  Tocrnooiveld, 
6525  ED  Nijmegen,  The 
Netherlands.) 

-  Specialized  short  courses  where 
subjects  can  be  treated  in 
greater  depth.  Many  subjects 
arc  possible,  but  those  that 
seem  to  be  favoured  are  ex¬ 
perimental  design,  multi¬ 
variate  calibration,  method 
validation  and  quality  as¬ 
surance  and  expert  systems.  A 
list  of  courses  available  for  in- 
house  teaching  will  also  be 
made  available. 

-  European  masters  degree.  The 
partners  in  the  project  will  try 
to  develop  a  degree,  the  aim  of 
which  is  to  train  chcmomct- 
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rictans  with  a  sufficiently  broad 
knowledge. 

The  Eurochemometrics  con* 
sortium  will  also  produce  ‘dis¬ 
tance  learning’,  courseware  and 
teaching  aids  For  instance,  Olav 
Kvalheim  will  complete  his 
SIRIUS  ADVISER  orogram  with 
a  videotape  introduction  course. 

The  total  level  of  expenditure 
is  about  25  million  ECU  (ie., 
about  US$  3.2  million)  of  which 
the  EEC  pays  half.  There  are  70 
partners  (about  30  industries,  30 
universities  and  10  research  in¬ 
stitutes).  The  project  is  coor¬ 
dinated  locally  by  12  centres.  One 
of  these  is  devoted  to  distance 
learning  (coordinator  Dr.  R. 
Brereton,  University  of  Bristol, 
School  of  Chemistry,  Cantock’s 
Close,  Bristol  BS8  ITS,  U.KO  and 
the  other  eleven  to  organizing 
courses  and  producing  teaching 
aids  and  courseware.  The  list  of 
these  centres  is  given  below, 
together  with  th&  name  and  ad¬ 
dress  of  the  coordinators).  Fur¬ 
ther  information  can  be  obtained 
from  the  author  of  this  article  or 
from  the  local  centres. 

Norway/Denmark.  Coordinator; 

O.  Kvalheim,  University  of  Ber¬ 
gen,  Department  of  Chemistry, 
Realfagbygget,  Allegt.  41,  5000 
Bergen,  Norway 

The  Netherlands.  Coordinator: 
L.  Buydens,  Katholiekc  Uni- 
versiteit  Nijmegen,  Laborato¬ 
ry  m  voor  Analytische  Chemie, 
Toernooivcld,  6525  ED  Nij¬ 
megen,  The  Netherlands 
Sweden/Finland.  Coordinator: 

P,  Gcladi,  Umefi  Universitet, 
Department  of  Organic  Chem¬ 
istry,  90187  Ume5,  Sweden 

Austrla/Germa  ny/Swl  tzer  la  nd. 
Coordinator:  YV.  Wegschcider, 
University  of  Technology  Graz, 


Institut  filr  Analytische 
Chemie,  Tecknikerstrasse  4, 
8010  Graz,  Austria 

France  (North).  Coordinator:  M. 
Feinberg,  1 N  A.,  Laboratoire 
de  Chimie  Analytique,  16  rue 
Claude  Bernard,  75231  Paris 
Cedex  5,  France 

France  (South).  Coordinator:  R. 
Phan-Tan-Luu,  LPRAI  Centre 
de  St.- Jerome,  University  d’Aix 
Marseille  III,  rue  Henri  Poin¬ 
care,  13397  Marseille  Cedex  13, 
France 

United  Kingdom/Ireland.  Coor¬ 
dinator:  SJ.  Haswell,  The 
University  of  Hull,  School  of 
Chemistry,  Hull,  HU6  7RX, 
U.K. 

United  Kingdom.  R.  Brereton, 
University  of  Bristol,  School  of 
Chemistry,  Cantock’s  Close, 
Bristol  BS8  ITS,  U.K, 

United  Kingdom.  S.  Pringle, 
University  of  Bristol,  Depart¬ 
ment  for  Continuing  Education, 
Wills  Memorial  Building, 
Queen’s  Road,  Bristol  BS8 
1HR,  U.K. 

Italy,  Coordinator:  M.  Forina,  Is- 
tituto  di  Analisi  e  Tecno,  Far- 
maceut.  ed  Alimentari,  Via 
Brigata  Salerno,  ponte,  16147 
Genova,  Italy 

Spain/PortugaL  Coordinator: 
F.X.  Rius,  Universitat  de  Bar¬ 
celona,  Depart  de  Quimica,  PI. 
Imperial  Tarraco  1,  43005  Tar¬ 
ragona,  Spain 

The  first  courses  to  be  an¬ 
nounced  within  the  COMBTT 

scheme  are: 

-  Chemometrie  und  kUnstliche 
Intelligenz  (8-12/4/91  —  Ruhr* 
Universittft  Bochum).  Informa¬ 
tion:  W.  Wegscheidcr 

-  Optimisation:  strategies  et 
mtHhcxles  (13-15/3/91  —  Paris). 
Information:  M.  Feinberg 


-  Quahte  et  validation  des 
mythodes.  La  bonne  pratique 
de  laboratoire  (9-11/10/91  — 
Paris).  Information:  M.  Fein- 
berg 

-  Echantillonnage  et  controle  de 
quahte  dans  les  industries 
agroalimentaires  (10-12/4/91 
—  Paris).  Information:  C. 
Ducauze 

-  Information  des  laboratoires 
(27-29/11/91  —  Paris).  Infor¬ 
mation:  M.  Feinberg 

-  Multivariate  optimization  and 
experimental  design  (26- 
28/591).  Information:  0.  Kval- 
heim 

-  7th  COMETT  School  on 
Chemometrics  (date  to  be  an¬ 
nounced  later  —  Nymegen).  In¬ 
formation:  L.  Buydens 

-  Etude  dans  un  domatne  ex¬ 
perimental  sans  contrainte 
(18-22/391  —  LPRAI  Mar¬ 
seille).  Information:  R.  Phan- 
Tan-Luu 

-  Sensibilisation  et  prmcipes  de 
base  (15-19/4/91  —  LPRAI 
Marseille).  Information:  R. 
Phan-Tan-Luu 

-  Formulation  et  melanges  (3- 
7/691  —  LPRAI  Marseille).  In¬ 
formation:  R.  Phan-Tan-Luu 

-  Mythodes  modernes  d’yiabora* 

tion  de  matrices  d’epyriences 
optimales  (14-18/1091  — ■ 

LPRAI  Marseille).  Information: 
R.  Phan-Tan-Luu 

-  Sensibilisation  et  principes  de 
base  (18-22/1191  —  LPRAI 
Marseille).  Information:  R. 
Phan-Tan-Luu 

-  Cribloge  et  ytude  des  facteurs 
(9-13/1291  —  LPRAI  Mar¬ 
seille).  Information:  R.  Phan- 
Tan-Luu 

D.L.  MASSART 
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News 


Interlaboratory 
Testing  Award 
Nominations 

Nominations  are  now  being  ac¬ 
cepted  for  the  1991  W.J.  Youden 
Award  in  Interlaboratory  Testing, 
sponsored  by  the  American 
Statistical  Association.  Final  date 
for  receipt  of  nominations  is  April 


1,  1991.  The  W.J.  Youden  Award 
in  Interlaboratory  Testing  was  es¬ 
tablished  in  1985  to  recognize 
publications  that  make  outstand¬ 
ing  contributions  to  the  design 
and/or  analysis  of  interlaboratory 
tests  or  describe  ingenious  ap¬ 
plications  to  the  planning  and 
evaluation  of  data  from  inter- 
laboratory  tests.  The  award  con¬ 
sists  of  US  $1,000  and  a  suitable 
citation. 


Book  Review 


Fourier  Transforms  in 
NMR,  Optical,  and 
Mass  Spectrometry.  A 
User’s  Handbook,  by 
A.G.  Marshall  and 
F.R.  Verdun 

Elsevier,  Amsterdam,  1989,  xvi  ♦ 
450  pages,  price  Dfl.  220.00,  VS$ 
107.25  ( hardcover ),  Dfl.  95.00, 
US$  46.25  (paperback),  ISDN  0- 
444-87 3600  (nardcover),  0-444- 
87412-7 (paperback) 

Fourier  transforms  ore  becoming 
increasingly  important  for  a 
range  of  spectroscopic  techniques. 
Some  of  these  techniques,  such  as 
NMR  and  infrared  spectrometry, 
are  now  performed  almost  ex¬ 
clusively  using  FT  instruments. 


The  object  of  this  book  is  to  clarify 
the  similarities  and  differences 
between  the  application  of 
Fourier  transforms  to  these  dif¬ 
ferent  techniques.  It  provides,  for 
the  first  time,  a  unified  treatment 
of  the  mathematics  of  Fourier 
transforms  and  their  application 
to  the  three  most  common  forms 
cf  FT  spectrometry.  Despite  the 
few  limitations  noted  below,  the 
aims  of  this  book  are  achieved  ad¬ 
mirably. 

The  style  of  this  book  was  ob¬ 
viously  carefully  thought  out;  the 
book  is  both  easy  to  understand 
and  very  readable.  The  use  of  in¬ 
volved  mathematics  is  avoided  ex¬ 
cept  where  necessary,  and  exten¬ 
sive  use  of  illustrations  is  made  to 
clarify  the  most  difficult  points. 
Physical  examples  are  also  given 
frequently  to  show  the  relevance 
of  particular  theorems  or  con¬ 
cepts.  A  set  of  problems  (with 


Eligible  publications  for  the 
1991  award  must  appear  in  profes¬ 
sionally  refereed  journals  or 
monograph  series  in  1989-1990. 
Nominations,  along  with  6  copies 
(in  English)  of  the  publication, 
should  be  sent  to  the  Chair  of  the 
Award  Committee,  Paul  vor  Doeh- 
ren,  Searle,  4901  Searle  Parkway, 
Skokie,  IL  60077,  U.S.A. 


answers)  is  presented  at  the  end 
of  each  chapter.  These  are  par¬ 
ticularly  useful  if  the  book  is 
being  used  as  a  class  text,  but 
could  also  be  valuable  to  readers 
who  wish  to  consolidate  their  un¬ 
derstanding  of  the  material  pre¬ 
sented  in  each  chapter.  The  only 
significant  complaint  about  the 
style  is  that,  because  of  the 
authors  desire  to  keep  the  mathe¬ 
matics  to  a  minimum,  readers  are 
regularly  requested  to  verify  a 
particular  result  for  themselves. 
This  is  often  justified,  since  most 
of  this  extra  material  would  rare¬ 
ly  be  used  and  its  inclusion  would 
simply  clutter  the  text.  At  other 
times,  however,  the  added  detail 
would  be  useful  and  the  fact  that 
readers  are  required  to  verify  it 
for  themselves  could  bo  irritating. 

The  book  consists  of  ten  chap¬ 
ters,  the  first  six  of  which  cover 
general  material,  and  the  last 
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four  of  which  deal  with  specific 
types  of  Fourier  transform 
spectrometry.  Chapter  1  intro¬ 
duces  spectral  line  shapes  and  ex¬ 
plains  the  Fourier  transform 
relationship  between  impulse 
response  and  continuous  oscilla¬ 
tion  experiments.  The  origins  of 
absorption  mode  and  dispersion 
mode  spectra  are  also  covered. 

Chapters  2  and  3  cover  the 
mathematics  of  Fourier  trans¬ 
forms  cf  both  continuous  and  dis¬ 
cretely  sampled  waveforms.  This 
includes  topics  such  as  dynamic 
range,  aliasing,  zero-filling,  apodi- 
zation,  and  phase  correction 

The  stated  purpose  of  Chapter 
4  is  to  deal  with  experimental 
aspects  that  are  common  to  all 
types  of  Fourier  transform 
spectrometry.  Although  this  is 
generally  true,  a  significant 
amount  of  the  material  presented 
has  little  or  no  relevance  to  FT- 
optical  spectrometry. 

Chapter  5  deals  with  the  dif¬ 
ferent  sources  of  noise  that  can 
occur  in  FT  spectrometry,  and 
which  sources  of  noise  lead  to  a 
multiplex  advantage  or  disad¬ 
vantage.  The  effects  of  signal 
averaging,  dynamic  range,  and 
apodization  on  the  signal-to-noise 
ratio  arc  also  discussed. 

In  Chapter  6  non-FT  methods 
for  converting  data  from  the  time 
to  frequency  domain  arc  ex¬ 
plained  and  compared  with  the 
FT  method.  It  is  worth  noting 
that  these  initial,  general  chap¬ 
ters  are  written  mainly  in  the  lan¬ 
guage  of  FT-NMR  or  FT-mass 
spectrometry,  which  is  not  always 
the  same  as  that  of  FT-optical 
spectrometry.  Because  of  this, 
readers  wishing  to  learn  about 


FT-optical  spectrometry  (in  par¬ 
ticular,  FT-IR)  may  find  them 
somewhat  confusing,  and  a  rather 
large  portion  of  the  material  ir¬ 
relevant.  For  readers  who  are 
mainly  interested  in  the  areas  of 
NMR  and/or  mass  spectrometry, 
however,  these  initial  chapters 
provide  an  excellent  and  com¬ 
prehensive-  introduction  to  FT 
spectrometry. 

Chapters  7,  8  and  9  deal  with 
aspects  of  FT  spectrometry  that 
are  unique  to  FT-mass  spectro¬ 
metry,  FT-nuclear  magnetic 
resonance  spectrometry,  and  FT- 
optical  spectrometry.  Of  these 
chapters,  that  on  FT-optical 
spectrometry  is  by  far  the 
weakest.  It  is  appreciably  shorter 
than  the  other  two  chapters,  and 
attempts  to  deal  with  FT-in- 
frared,  FT-ultraviolet/visible,  PI*- 
Raman  and  Hadamard  trans- 
form-Raman  spectrometries. 
Consequently,  none  of  these  tech¬ 
niques  are  covered  m  enough 
detail  to  give  anything  more  than 
a  very  basic  introduction. 

Although  Chapter  9  is  rather 
poor,  the  two  chapters  on  FT- 
NMR  and  FT-mass  spectrometry 
give  a  good  overview  of  the  cur- 
rer  t  state  of  the  art,  and  enough 
information  to  give  a  solid 
grounding  in  the  field  of  interest. 

Chapter  10  provides  a  brief 
review  of  the  application  of  FT 
methods  to  other  forms  of 
spectrometry.  Finally,  five  appen¬ 
dices  are  included  which  give  in¬ 
tegrals  and  theorems  for  FT  ap¬ 
plications,  a  description  and 
program  listings  in  FORTRAN 
and  BASIC  for  the  fast  Fourier 
transform  algorithm,  a  com¬ 
prehensive  atlas  of  Fourier  trans¬ 


form  pairs,  and  other  useful  data. 
These  appendices  are  a  good  addi¬ 
tion,  and  mean  that  the  book  cer¬ 
tainly  qualifies  as  “a  user’s  hand¬ 
book”. 

This  book  is  clearly  aimed  at 
students  and  scientists  who  need 
to  learn  about  several  types  of  FT 
spectrometry,  and  it  is  an  excel¬ 
lent  text  for  this  purpose.  It 
should  prove  to  be  particularly 
useful  both  as  a  teaching  text  and 
as  a  general  reference  for  Fourier 
transform  methods  as  they  are 
applied  to  spectrometry. 

For  newcomers  to  the  fields  of 
FT-NMR  or  FT-mass  spectro¬ 
metry  this  is  also  an  excellent  in¬ 
troductory  text,  which  puts  the 
technique  of  interest  into  the  con¬ 
text  of  other  forms  of  FT 
spectrometry.  Although  those 
wishing  to  learn  about  FT-optical 
spectrometry  may  find  this  book 
to  be  rather  confusing  and  the  in¬ 
formation  m  it  somewhat  limited, 
for  those  who  already  have  a  good 
grounding  in  these  techniques 
considerable  insight  could  be 
gamed  from  the  fresh  look  at  old 
material. 

Overall,  this  is  a  book  to  be 
recommended,  and  it  should 
prove  to  be  a  valuable  addition  to 
many  spectroscopists*  bookshel¬ 
ves. 


RICHARD  S.  JACKSON  and 
PETER  R.  GRIFFITHS 
Department  of  Chemistry, 
The  University  of  Idaho, 
Moscow,  ID  83843,  U.S.A. 
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Meeting  Report 


MADLUST  90, 
Chemometrics 
Towards  20000s 
Tromso,  Norway, 

2-6  July  1990 

MADLUST  90  was  the  third  in  a 
series  of  workshop  seminars  on 
chemometrics.  The  previous  two, 
ASTMULD  (1984)  and  MULDAST 
(1987)  were  very  much  local  to 
Scandinavian  chemometricians 
developing  the  theories  and  tools 
now  widely  accepted  throughout 
the  world.  For  MADLUST,  the  or¬ 
ganisers  (Kim  Esbensen,  Norway, 
Paul  Geladi  and  Michael  Sj&strom, 
Sweden,  and  Pentti  Minkkmen, 
Finland)  took  a  worthwhile 
decision  to  broaden  the  focus  of  the 
meeting  to  include  people  from  in¬ 
dustry  who  apply  chemometrics  to 
their  particular  problems.  The  hope 
was,  of  course,  that  the  two  groups 
would  spark  ideas  off  each  other. 
The  hope  was  well  realised.  The 
meeting  was  organised  around  four 
main  themes:  Process  Chemo- 
metrics,  Statistics  and  Chemo¬ 
metrics,  Chemometrics  Towards 
2000,  and  Image  Analysis  in 
Chemometrics.  Each  theme  occu¬ 
pied  a  day  and  discussions  on  the 
theme  were  focused  by  presenta¬ 
tions  from  a  small  group  of  speak¬ 
ers.  This  arrangement  meant  that 
plenty  of  time  was  available  for  dis¬ 
cussion. 

The  Process  Chemometrics  ses¬ 
sion  was  perhaps  the  major  in¬ 
novation  of  tho  meeting.  The 
presentations  were  by  John  Mac¬ 
Gregor  (MacMaster  University, 
U.S.A.),  Roy  Tranter  (Glaxo 
Manufacturing  Services,  U.KJ, 
Randy  Pell  (University  of  Wash¬ 


ington,  U  S  A.)  and  a  trio  from  the  I 
University  of  Washington,  U.S.A., 
representing  the  Center  for 
Process  Analytical  Chemistry  (Jim 
Burger,  Marybeth  Seasholtz  and 
Yondong  Wang).  The  presentations 
are  subsequent  discussions  high¬ 
lighted  three  major  areas,  two  of 
which  are  not  normally  considered 
by  chemometricians.  The  interface 
between  the  process  operator  and 
chemometrics  is  very  important 
and  determines  the  acceptability  of 
the  method  and,  hence,  its  overall 
success.  Port  of  the  interface  is  the 
presentation  of  the  results  from 
the  chemometrics  and  the  concept 
of  having  a  visible,  variable-sized 
dustbin  for  all  unexplained  or  un¬ 
expected  effects  proved  to  be  novel 
and  challenging  to  some.  The  third 
area  —  locally  weighted  models  — 
has  proved  valuable  but  clearly 
needs  more  theoretical  develop¬ 
ment  to  be  generally  applicable. 

Tho  session  on  Statistics  and 
Chemometrics  was  more  con¬ 
cerned  with  the  theoretical  devel¬ 
opment  of  chemometric  and  was 
presented  by  Tormod  Naes  (MAT- 
FORSTK,  Norway),  Age  Smilde 
(University  of  Groningen,  The 
Netherlands),  Hons  Bcrntsen 
(SINTEF,  Norway)  and  Agnar 
H&skuldsson  (DIA-M,  Denmark). 
Four  quite  different  subjects  were 
discussed:  local  modelling,  the 
analysis  of  threo  dimensional  data 
arrays,  the  relation  of  the  extended 
Kalman  filter  with  bi-linear 
modelling  and  the  optimisation  of 
selecting  t-vectors  for  inclusion  in 
a  PLS  model.  Each  created  con¬ 
siderable  discussion  and  the  first 
two,  at  least,  showed  how  some  of 
the  problems  highlighted  in  the 
first  session  could  be  resolved. 


Chemometrics  Towards  2000 
allowed  elements  of  art  and  culture 
to  be  introduced  into  chemometrics 
as  well  as  consideration  of  some  of 
the  problems  facing  chemometrics. 
Enk  Johansson  (Hassle  AB, 
Sweden),  Willem  Windig  (Eastman 
Kodak,  U.S.A9  and  Harald  Mar¬ 
tens  (Consensus  Analysis  A/S,  Nor¬ 
way)  raised  the  issues  of  the  image 
of  chemometrics  in  managers’ 
minds.  There  is  a  need  for  simpli¬ 
city  of  approach  and  the  incorpora¬ 
tion  of  techniques  from  outside 
chemometrics,  if  chemcmetrics  is  to 
survive  and  develop  as  a  viable 
subject.  These,  and  the  major  dis¬ 
cussion  session  subtitled  ‘The 
Chemometrics  User  Speaks  Back” 
were  a  highlight  of  the  week  as 
they  clarified  a  number  of  ideas 
that  could  increase  the  accept¬ 
ability  and  usefulness  of  chemo¬ 
metrics  in  many  areas,  particularly 
in  industry. 

The  final  session,  Image 
Analysis,  was  presented  by  Ewart 
Bengtson  (Centre  for  Image  Anal¬ 
ysis,  Sweden)  and  Hans  Grahn 
(University  of  Umefi,  Sweden). 
Here,  the  benefits  of  being  able  to 
extract  from  very  large  image  data 
sets  the  parts  of  an  image  which 
are  related  to  each  other  through 
chemical,  physical  or  medical  fac¬ 
tors,  were  well  described.  As  these 
techniques  are  essentially  non¬ 
destructive  as  far  as  samples  are 
concerned,  they  have  potential  in 
process  analysis,  thus  bringing  the 
meeting  back  to  its  starting  point. 

R.L.  TRANTER 
Glaxo  Manufacturing  Services 
Ltd.,  Barnard  Castle, 
Co.  Durham,  U.K. 
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Organizer’s  summary 


This  was  another  imporlant  meeting  where 
leading  researchers  in  both  the  chemical  and 
mathematical  sciences  exchanged  ideas  and  dis¬ 
cussed  new  results.  There  was  ample  time  for 
participants  to  form  new  friendships  and  exchange 
ideas.  One  of  the  mam  benefits  of  these  meetings 
is  to  get  to  meet  and  know  colleagues  from  outside 
disciplines.  Participants  enjoyed  wine  tasting  at  a 
local  winery  during  the  second  night  of  the  con¬ 
ference.  During  the  first  night  the  participants  had 
a  banquet  dinner  with  after  dinner  speaker  Dr. 
Herbert  Hauptman  co-winner  of  the  1985  Nobel 
Prize  in  chemistry.  He  at  my  request,  gave  a  frank 
discussion  of  the  difficulty  of  getting  chemists  to 
accept  hts  and  Karle's  results.  Part  of  these  diffi¬ 
culties  arc  presented  in  the  wntten  version  of  his 
talk.  Dean  Abe  Clearfield  of  Texas  A&M  and  Dr. 
E.  Prince  of  the  National  Institute  of  Standards 
and  Technology  at  my  request  have  included  in 
the  proceedings  their  comments  that  followed  Dr. 
Hauptman’s  talk. 

As  was  the  case  at  the  1985  Chcmometrics 
Research  Conference  that  I  coorganized  most  in¬ 
vited  talks  had  invited  discussants.  Chemists'  talks 
were  discussed  by  a  mathematician  and  mathema¬ 
ticians'  talks  weie  discussed  by  a  chemist.  Some 
speakers  were  hard  to  classify  as  belonging  to  one 
field  or  the  other.  The  main  focus  of  the  invited 
discussions  was  to  explain  and  expand  upon  the 
main  presentation  to  the  broad  audience. 

The  opening  session  was  moderated  by  Lloyd 
Currie  of  NIST  and  the  opening  speaker  was  Leon 
Gleser  whose  talk  demonstrated  to  the  conference 
that  measurement  error  models  are  often  useful. 
The  second  talk  was  by  Anne  Thompson  and  she 
discussed  chemical  and  statistical  modeling  to  en¬ 
vironmental  science. 

The  second  session  dealt  with  making  sense 
from  multivariate  data.  Peter  Jurs  gave  a  survey  of 


the  use  of  clustering  procedures  in  his  laboratory. 

The  third  session  dealt  with  modeling  in  chem¬ 
istry.  Professor  Steve  Brown  of  the  University  of 
Delaware  gave  his  change  of  time  series  proce¬ 
dures  for  calibration  while  Professor  Don  Watts 
of  Queens  University  demonstrated  how  useful 
profile  i  and  trace  plots  con  be  in  obtaining 
interval  estimates.  The  fourth  session  dealt  with 
statistical  mechanics  issues  during  which  the  audi¬ 
ence  was  treated  to  interesting  fractal  plots  and 
interpretations.  The  Speakers  were  Fereydoon 
Family  from  Emory  University,  Dan  ben-Avra- 
ham  from  Clarkson  University,  and  David  Wcitz 
from  Exxon  Research  Labs. 

The  sixth  session  gave  an  interesting  descrip¬ 
tion  of  how  a  graduate  student  in  statistics  work¬ 
ing  with  a  distinguished  clectrochemist  can  impact 
chemistry.  This  talk  was  given  by  Janet 
Osteryoung.  The  second  talk  at  this  session  was 
also  based  upon  joint  work  by  an  agricultural 
chemist  and  a  statistician.  They  gave  interesting 
case  study  examples  of  where  PLS  would  and 
would  not  work.  The  third  talk  was  an  interesting 
statistical  layout  of  receptor  modeling  given  by 
Karen  Bandcen-Rochc. 

In  the  next  session  Phil  Hopkc  gave  a  tutorial 
on  the  use  of  receptor  modeling  and  Ron  Henry 
gave  a  lecture  about  the  use  of  optimization  meth¬ 
ods  in  environmental  modeling. 

We  had  a  dynamic  session  on  structural  model¬ 
ing  that  included  talks  by  Ted  Prince  of  the  NIST, 
Macolm  Gerloch  of  Cambridge  University,  and 
Milan  Randii  of  Drake  University.  Ted  talked 
about  the  use  of  maximum  entropy  techniques  to 
resolve  structure.  (Ted  says  that  since  the  con¬ 
ference  he  and  some  colleagues  have  made  im¬ 
portant  advances.)  Macolm  talked  about  ligand 
field  theory  and  the  electronic  structure  of  in¬ 
organic  complexes  and  Milan  gave  an  interesting 
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talk  on  the  use  of  graph  theory  as  a  companion 
procedure  to  more  often  used  clustering  tech¬ 
niques. 

The  final  session  was  about  multivariate  analy¬ 
sis  and  design.  It  was  enjoyed  by  all.  Probably  a 
humorous  thing  that  many  will  remember  for  a 
long  time  is  the  ‘honors’  that  Cris  Nachheim 


tacked  onto  his  name  with  the  abstract  such  as 
FRS  and  ASPCA  among  others.  Cris  gave  an 
interesting  talk  on  experimental  design  and  Pat 
Carey  gave  examples  of  successful  application  of 
PLS  methods  at  Los  Alamos. 

C.H.  SPIEGHLMAN 


■  Original  Research  Paper 
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Abstract 


Hauptman,  HA .  1991.  History  of  X-ray  crystallography.  Chemometnes  and  Intelligent  Laboratory  Systems,  10.  13-18 


In  this  bnef  sketch  of  the  history  of  X-ray  crystallography  1  emphasize  the  important  role  played  by  the  development  of  the  direct 
methods  which  were  devised  to  solve  the  central  problem  of  X-ray  crystallography,  the  so-called  phase  problem.  1  also  stress  the 
importance  of  cross  disciplinary  research,  in  particular  the  essential  role  which  mathematics  played  in  this  development. 


INTRODUCTION 

In  1895  Wilhelm  Rdntgen  discovered  X-rays. 
With  this  discovery  the  stage  was  set  for  the 
creation  of  the  modern  science  of  X-ray  crys¬ 
tallography. 

In  1912  Paul  Ewaid  was  completing  his  doc¬ 
toral  dissertation  concerned  with  the  optical  prop¬ 
erties  of  a  medium  consisting  of  a  regular  arrange¬ 
ment  of  isotropic  resonators.  A  crystalline  solid 
which,  on  the  sub-microscopic  level,  consists  of  a 
triply  periodic,  regular  arrangement  of  atoms,  or 
molecules,  is  therefore  precisely  the  kind  of 
medium  with  which  Ewaid  was  concerned.  Since 
the  smallest  interatomic  distances  m  a  crystal  are 
of  the  same  order  of  magnitude  as  the  wavelengths 
of  X-rays,  it  occurred  to  Max  von  Lauc,  upon 
learning  of  Ewald's  results,  that  a  crystal  might 
serve  as  a  three-dimensional  diffraction  grating  for 
X-rays.  In  order  to  test  this  hypothesis  he  pre¬ 
vailed  upon  the  younger  physicists  Walter 


Friedrich  and  Paul  Knipping  to  perform  the  nec¬ 
essary  scattering  experiment. 

The  scattering  experiment  indeed  showed  that 
when  a  beam  of  X-rays  strikes  a  crystal,  the  crystal 
scatters  the  incident  beam  in  many  different  direc¬ 
tions  and  with  different  intensities.  If  these 
scattered  X-rays  strike  a  photographic  plate  they 
will  blacken  the  plate  at  those  points  where  the 
scattered  rays  strike  the  plate.  In  this  way  one 
obtains  the  so-called  diffraction  pattern.  This  ex¬ 
periment  marked  the  birth  of  the  science  of  X-ray 
crystallography  and,  because  of  its  fundamental 
importance  in  determining  crystal  and  molecular 
structures,  must  be  regarded  as  a  landmark  event 
in  twentieth  century  science.  The  major  obstacle 
in  the  path  leading  from  the  observed  diffraction 
pattern  to  the  desired  crystal  structure  is  known  as 
the  phase  problem,  for  reasons  to  be  given  shortly. 
I  propose  here  to  give  a  brief  historical  account  of 
the  methods  devised  to  overcome  this  obstacle,  the 
so-called  direct  methods  of  X-ray  crystallography. 
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THE  DIFFRACTION  PATTERN 

It  has  already  been  remarked  that  a  crystal  may 
be  regarded  as  a  regular  triply  periodic  arrange¬ 
ment  of  an  array  of  atoms.  One  imagines  three 
families  of  planes,  the  planes  m  each  family  being 
parallel  to  and  equidistant  from  one  another.  In 
this  way  one  obtains  a  tiling  of  the  crystal  space 
by  means  of  congruent  parallelepipeds  each  one  of 
which  is  said  to  be  a  fundamental  parallelepiped, 
or  unit  cell,  of  the  crystal. 

If  each  unit  cell  contains  a  molecule  —  a 
collection  of  atoms  — -  in  its  interior,  and  if  the 
atoms  are  arranged  in  precisely  the  same  way  in 
all  the  unit  cells,  then  each  unit  cell  and  its  con¬ 
tents  are  indistinguishable  from  every  other  unit 
cell  and  its  contents. 

There  corresponds  to  each  atom  an  electron 
density  function;  hence,  by  superposition  of  the 
individual  atomic  electron  density  functions,  one 
obtains  an  overall  electron  density  function  p(r), 
a  nonnegative  function  of  the  /option  vector  r 
which  gives  the  number  of  electrons  per  unit 
volume  at  the  position  r.  It  is  clear  from  the 
geometric  construction  that  the  electron  density 
function  in  any  unit  cell  is  identical  to  that  in 
every  other  unit  cell.  Hence  p(r)  is  a  triply  peri¬ 
odic  function  of  position,  and  this  property  may 
be  taken  as  the  mathematical  definition  of  a 
crystal. 

If  on  the  other  hand  we  choose  to  regard  a 
crystal  as  a  triply  periodic  arrangement  of  an 
array  of  atoms,  or  molecules,  then  by  a  crystal 
structure  we  mean  simply  the  arrangement  and 
identities  of  the  atoms  in  the  unit  cell  and  by  a 
molecular  structure  the  arrangement  and  identities 
of  the  atoms  in  the  molecule. 

It  was  recognized  almost  from  the  beginning 
that  the  diffraction  pattern,  that  is  the  directions 
and  intensities  of  the  X-rays  scattered  by  a  crystal, 
is  uniquely  determined  by  the  crystal  structure; 
which  is  to  say  that  if  one  knew  the  crystal  struc¬ 
ture  —  the  arrangement  of  the  atoms  in  the 
crystal  —  then  one  could  calculate  the  diffraction 
pattern  completely.  It  turns  out  that,  conversely, 
diffraction  patterns  in  general  determine  unique 
crystal  and  molecular  structures,  although  this  fact 
was  not  known  until  many  years  later.  In  short, 


the  information  content  of  a  typical  molecular 
structure  coincides  precisely  with  the  information 
content  of  its  diffraction  pattern  It  is  a  measure 
of  the  great  advances  made  by  the  new  science  of 
X-ray  crystallography  that  one  nowadays  can 
routinely  .transform  the  information  content  of  a 
diffraction  pattern  into  a  molecular  structure,  at 
least  for  the  so-called  ‘small’  molecules,  that  is 
those  consisting  of  some  150  or  fewer  non-hydro- 
gen  atoms. 


THE  PHASE  PROBLEM 

Since  X-rays,  like  ordinary  visible  light,  are 
electromagnetic  waves,  they  have  a  phase  as  well 
as  an  intensity,  just  as  any  other  wave  disturbance. 
In  order  to  work  backwards,  from  diffraction  pat¬ 
terns  to  crystal  and  molecular  structures,  it  turns 
out  to  be  necessary  to  measure  not  only  the  inten¬ 
sities  of  the  X-rays  scattered  by  the  crystal  but 
their  phases  as  well.  However,  the  phases  cannot 
be  measured  in  the  ordinary  kind  of  diffraction 
experiment;  they  appear  to  be  irretrievably  lost. 
Only  the  intensities  can  be  directly  measured.  This 
then  gives  rise  to  the  central  problem  of  X-ray 
crystallography,  the  so-called  phase  problem,  how 
to  deduce  the  values  of  the  phases  of  the  X-rays 
scattered  by  a  crystal  when  only  their  intensities 
are  known.  For  some  forty  years  after  the  land¬ 
mark  experiment  of  Friedrich  and  Knippmg,  all 
attempts  to  find  a  general  method  for  going  di¬ 
rectly  from  the  diffraction  pattern,  that  is  meas¬ 
ured  intensities  alone,  to  the  crystal  structure,  with 
or  without  the  intervention  of  the  phases  —  a 
method  that  would  be  useful  for  the  complex 
structures  of  interest  to  chemists,  biologists,  and 
mineralogists  —  were  defeated. 

In  fact,  because  the  needed  phase  information 
was  lost  in  the  diffraction  experiment,  it  was 
thought  that  one  could  use  arbitrary  values  for  the 
phases  associated  with  the  measured  intensities  of 
the  scattered  X-rays.  In  this  way  one  obtains  a 
myriad  of  different  crystal  structures,  all  con¬ 
sistent  with  the  known  intensities.  It  therefore 
came  to  be  generally  believed  that  a  procedure  for 
going  directly  from  the  measured  intensities  to 
crystal  structures  could  not,  even  in  pnnciple,  be 
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devised.  By  the  same  mode  of  thinking,  the  prob¬ 
lem  of  deducing  the  values  of  the  individual  phases 
from  the  diffraction  intensities,  the  so-called  phase 
problem,  was  also  thought  to  be  unsolvable,  even 
m  principle.  It  wasn’t  until  the  early  1950s,  through 
the  exploitation  of  special  properties  of  molecular 
structures  and  through  a  simple  mathematical 
argument,  that  these  erroneous  conclusions  were 
finally  refuted. 

Atomicity 

The  special  property  that  all  crystal  and  molec¬ 
ular  structures  possess  may  be  summed  up  in  one 
word;  atomicity.  Thus  the  electron  density  func¬ 
tion  p(r)  in  a  crystal  takes  on  large  positive  values 
at  the  atomic  position  vectors  and  drops  to  small 
values  between  the  atoms.  If  our  goal  is  merely  to 
determine  the  positions  of  the  atoms  —  that  is, 
the  positions  of  the  maxima  of  p(r)  —  rather 
than  the  much  more  complicated  electron  density 
function  associated  with  the  distribution  of  atoms 
in  fhc  crystal,  then  our  problem  is  greatly  sim- 
phlicd,  it  turns  out  to  be  not  only  determinate  but 
actually  greatly  overdetern  ined  by  the  available 
X-ray  diffraction  intensities. 

This  is  most  easily  seen  by  eliminating  the  lost 
phase  information  from  the  relationships  between 
the  diffraction  pattern  and  the  crystal  structure. 
Doing  this  results  in  a  system  of  equations  relating 
the  diffraction  intensities  alone  with  the  atomic 
position  vectors.  Because  the  number  of  these 
relationships  far  exceeds  (by  a  factor  of  ten  or  so) 
the  number  of  unknown  position  vectors  needed 
to  define  the  crystal  structure,  our  problem  is 
greatly  overdetermined.  Thus  it  is  clear  that  there 
uist  relationships  between  the  measured  diffrac¬ 
tion  intensities  and  the  lost  phases  that  may  be 
exploited.  It  follows  that  the  phases  of  the  scattered 
X-rays  are  also  determined  by  their  intensities.  In 
shor1.,  the  lost  phase  information  is  to  be  found 
among  the  available  intensities,  and  the  phase 
problem  is  therefore  a  solvable  one,  at  least  in 
principle.  There  remains  the  task  of  devising 
numencal  algorithms  leading  from  the  abundance 
of  experimentally  measured  diffraction  intensities 
to  the  values  of  the  individual  phases.  The  tech¬ 
niques  of  X-ray  crystallography  that  deduce  the 


individual  phases  by  exploiting  relationships  be¬ 
tween  measured  diffraction  intensities  and  phases 
are  known  as  direct  methods. 

The  argument  just  presented  was  m  fact  antic¬ 
ipated  in  1927  by  Heinrich  Ott  [1],  who  showed  by 
algebraic  analysis  and  applications  that  the  method 
is  capable  of  solving  simple  centrosymmetric 
structures,  in  which  all  phases  must  be  either  0  or 
17.  The  method  was  further  elaborated  by  Kedares- 
war  Bancrjee  in  1933  [2]  and  Melvin  Avrami  in 
1938  (3J  but  was  clearly  of  only  limited  value  in 
applications.  While  this  early  work  of  Ott,  Baner- 
jee  and  Avrami  shed  important  light  on  the  more 
general  phase  problem,  it  attracted  little  attention 
at  the  time  and  was  not  further  developed,  it 
appears  now  to  be  all  but  forgotten. 

Solving  the  phase  problem 

My  work  on  this  problem  started  in  1948  about 
a  year  after  I  joined  the  Naval  Research  Labora¬ 
tory  in  Washington,  DC  and  commenced  my  col¬ 
laboration  with  Jerome  Karlc.  It  had  been  some 
35  years  since  Friedrich  and  Kmpping  had  carried 
out  their  famous  experiment,  and  by  1947  the 
phase  problem,  the  central  problem  of  X-ray 
crystallography,  was  still  unsolved  and  generally 
regarded  as  unsolvable.  The  central  importance  of 
this  problem  and  its  strong  mathematical  compo¬ 
nent  combined  to  provide  a  challenge  that  could 
not  be  denied. 

Then  too,  there  was  a  certain  air  of  mystery 
surrounding  the  problem.  On  the  one  hand  the 
simplicity  and  logic  of  the  argument  “proving"  its 
unsolvabilily.  even  in  principle,  appeared  to  be 
overwhelming.  On  the  other  hand  crystal  and 
molecular  structures  were  being  solved,  although 
the  structures  studied  were  almost  always  very 
simple  ones  involving  a  small  number  of  atoms  or 
larger  structures  containing  one  or  a  small  number 
of  heavy  atoms,  for  which  special  techniques  had 
been  devised.  It  had  not  yet  been  generally  under¬ 
stood  that  the  implicit  assumption  of  atomicity 
and  the  concomitant  trial-and-error  approach  to 
most  structure  solutions  had  imposed  a  powerful 
restriction  on  the  permitted  values  of  the  phases. 

The  first  important  contribution  that  Karle  and 
I  made  was  the  recognition  that  it  would  be  neces- 
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sary  to  exploit  pnor  structural  knowledge  to  trans¬ 
form  the  phase  problem  from  an  unsolvable  one 
to  one  that  was  solvable,  at  least  in  principle.  Our 
first  step  m  this  direction  was  to  exploit  the  non¬ 
negativity  of  the  electron  density  function  p(r). 
Before  our  analysis  was  complete,  however,  David 
Harker  and  John  Kasper  published  their  famous 
paper  [4j  in  which  they  derived  inequalities  in 
which  the  measured  intensities  restrict  the  possible 
values  of  the  phases.  This  was  a  very  mysterious 
paper,  because  nowhere  in  it  does  there  appear 
any  explicit  mention  of  the  basis  for  the  inequality 
relations,  and  indeed  the  most  important  fact  is 
conspicuous  by  its  absence.  It  is  simply  that  the 
electron  density  function  is  nonnegative  every¬ 
where.  This  fact  is,  however,  implicit  in  Harker 
and  Kasper’s  work.  In  very  short  order  Karle  and 
1  completed  our  own  analysis  and  derived  the 
complete  set  of  inequality  relationships  based  on 
the  nonnegativity  of  the  electron  density  function 
(5).  It  includes  the  Harker- Kasper  inequalities  as 
a  special  case,  and  many  others  besides.  Although 
the  complete  set  of  inequalities  greatly  restricts  the 
values  of  the  phases,  the  relations  appear  to  be  too 
intractable  to  be  useful  in  applications,  except  for 
the  simplest  structures,  and  their  potential  has 
never  been  fully  exploited. 

The  recognition  in  1950  and  1951  that  mole¬ 
cules  consist  of  atoms  that  to  a  good  approxima¬ 
tion  may  be  regarded  as  points  completely  trans¬ 
formed  the  nature  of  the  phase  problem.  While  it 
meant  accepting  as  fact  that  the  observed  diffrac¬ 
tion  intensities  by  themselves  were  indeed  not 
sufficient  to  determine  a  unique  electron  density 
function,  it  also  meant  that  they  were  more  than 
sufficient,  by  far,  to  determine  the  atomic  position 
vectors  (6).  It  meant  as  well  that  the  phases  corre¬ 
sponding  to  the  point  atom  structure  were  greatl* 
overdetermined  by  the  available  intensities.  Fi¬ 
nally,  it  meant  that  a  formidable  psychological 
barrier  had  been  removed,  because  it  now  made 
sense  to  look  for  a  solution  to  the  phase  problem, 
that  is,  for  numerical  algorithms  leading  from 
measured  intensities  to  individual  phases.  In 
hindsight  it  is  perfectly  clear  that  owing  to  the 
great  overabundance  of  diffraction  data,  a  prob¬ 
abilistic  approach  is  called  for;  some  40  years  ago, 
however,  this  was  not  so  apparent. 


Before  we  could  even  get  started,  an  unex¬ 
pected  complication  arose.  It  turned  out  that  be¬ 
cause  the  values  of  the  individual  phases  clearly 
depend  not  only  on  the  crystal  structure  but  also 
on  the  choice  of  origin,  they  are  not  uniquely 
determined  by  the  crystal  structure  alone.  It  fol¬ 
lowed  that  the  diffraction  intensities  alone  do  not 
determine  unique  values  for  the  phases.  The  pro¬ 
cess  leading  from  diffraction  intensities  to  phases 
would  have  to  include  a  recipe  for  specifying  the 
origin.  This  required  that  we  separate  out  two 
contributions  to  a  phase,  one  due  to  the  crystal 
structure  alone  and  one  due  to  the  choice  of 
origin.  We  clearly  needed  to  study  how  a  phase  is 
transformed  when  the  origin  is  shifted,  a  problem 
that  was  complicated  by  the  fact  that  the  permis¬ 
sible  origins  depend  on  the  crystallographic  ele¬ 
ments  of  symmetry,  which  were  usually  known  in 
advance. 

The  solution  was  made  easier  by  the  discovery 
that  there  are  always  certain  linear  combinations 
of  the  phases,  the  so-called  structure  invariants, 
that  are  uniquely  determined  by  the  crystal  struc¬ 
ture  alone  and  are  independent  of  the  choice  of 
origin.  It  is  therefore  only  the  values  of  the  struc¬ 
ture  invariants  that  we  can  hope  to  estimate  from 
the  measured  intensities.  Once  we  have  estimated 
a  sufficient  number  of  these  we  can  then  hope  to 
evaluate  the  individual  phases  by  a  process  that 
incorporates  a  recipe  for  specifying  the  origin. 

What  was  clearly  called  for  was  the  devising  of 
a  method  for  identifying  the  structure  invariants, 
and  then  using  these  to  come  up  with  recipes  for 
fixing  the  origin  appropriate  to  the  different  ele¬ 
ments  of  crystallographic  symmetry  that  may  be 
present.  Once  this  was  done  there  would  remain 
the  task  of  estimating  the  values  of  the  structure 
invanants  by  means  of  their  conditional  probabil¬ 
ity  distributions,  assuming  that  an  appropriately 
chosen  set  of  diffraction  intensities  is  known. 

Probabilistic  techniques 

Beyond  any  doubt  our  most  important  contri¬ 
bution  during  the  early  1950s  was  the  introduction 
of  probabilistic  techniques  —  in  particular,  use  of 
the  joint  probability  distribution  of  several  dif¬ 
fraction  intensities  and  the  corresponding  phases 
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—  as  the  central  tool  m  the  solution  of  the  phase 
problem  (7).  We  assumed  to  begin  with  that  all 
positions  of  the  atoms  in  the  unit  cell  of  the 
crystal  were  equally  likely,  or,  in  the  language  of 
mathematical  probability,  that  the  atomic  position 
vectors  were  random  variables,  uniformly  and  in¬ 
dependently  distributed.  With  this  assumption  the 
intensities  and  phases  of  the  scattered  X-rays,  as 
functions  of  the  atomic  position  vectors,  are  also 
random  variables,  and  one  can  use  the  methods  of 
modern  mathematical  probability  theory  to  calcu¬ 
late  the  joint  probability  distribution  of  any  col¬ 
lection  of  intensities  and  phases.  A  suitably  cho¬ 
sen  joint  probability  distribution  leads  directly  to 
the  conditional  probability  distribution  of  a 
specified  structure  invariant,  assuming  again  an 
appropriately  chosen  set  of  diffraction  intensities. 
The  conditional  distribution  in  turn  leads  to  the 
structure  invariant,  an  estimate  of  which  is  given, 
for  example,  by  its  most  probable  value.  Once  one 
has  a  sufficiently  large  number  of  sufficiently  reli¬ 
able  estimates  of  structure  invariants,  one  can  use 
standard  techniques  to  calculate  the  values  of  the 
individual  phases,  provided  that  the  process  incor¬ 
porates  a  recipe  for  specifying  the  origin. 

Although  probabilistic  methods  played  an  es¬ 
sential  role  in  the  development  of  the  direct 
method  and  provided  it  with  its  logical  founda¬ 
tion,  it  must  also  be  pointed  out  that  non-prob- 
abihstic  methods  also  played  an  important  part. 
In  this  connection  the  early  work  of  Sayre  (8), 
Zachariascn  (9),  Cochran  [10]  and  Woolfson  (11) 
should  be  mentioned,  in  particular  the  well  known 
Sayre  equation,  a  relationship  of  fundamental  im¬ 
portance  among  measured  magnitudes  and  un¬ 
known  phases,  continues  to  be  useful  to  the  pres¬ 
ent  day  and  lies  at  the  heart  of  some  of  the  more 
successful  computer  programs  for  solving  crystal 
structures. 


CONCLUDING  REMARKS 

I  cannot  conclude  this  brief  account  of  the 
early  history  of  the  direct  methods  of  X-ray  crys¬ 
tallography  without  also  describing  the  reception 
this  work  received  at  the  hands  of  the  crystallo¬ 
graphic  community.  This  was,  simply,  extreme 


skepticism,  if  not  outright  hostility.  In  hindsight  I 
think  this  reaction  was  due,  first,  to  the  strong 
mathematical  flavor  of  this  early  work,  not  well 
understood  by  most  crystal lographers,  as  well  as 
the  ingrained  and  almost  universal  belief  that  the 
phase  problem  was  umolvablc  in  principle  and 
that  any  claim  to  the  contrary  must  therefore  be 
flawed  This  nearly  universal  skepticism  and  in¬ 
ability  to  understand  the  proposed  solution  no 
doubt  explains  why  so  few  early  attempts  to  apply 
the  new  methods  were  made.  It  wasn’t  until  the 
1960s,  when  easy  to  use  computer  programs  be¬ 
came  available,  that  widespread  applications  were 
made. 

Today  some  100000  molecular  structures  are 
known,  most  determined  by  the  direct  methods, 
and  about  5000  new  structures  are  added  to  the 
list  every  year.  It  is  no  exaggeration  to  say  that 
modern  structural  chemistry  owes  its  existence  to 
this  development. 

Although  no  equations  are  shown  in  this  article, 
it  should  be  clear  that  the  developments  described 
here  would  not  have  been  possible  without  strong 
dependence  on  mathematical  techniques,  in  par¬ 
ticular  the  modern  theory  of  mathematical  prob¬ 
ability,  and  it  is  this  interaction  between  mathe¬ 
matics  and  the  phase  problem  of  X-ray  crystallog¬ 
raphy  which  I  have  tried  to  emphasize  in  this 
article.  Work  on  the  phase  problem  continues  to 
this  day  and  applications  to  structures  of  ever 
increasing  complexity  continue  to  be  made.  It  still 
appears  that  progress  is  made  only  in  proportion 
to  our  ability  to  bring  more  powerful  mathemati¬ 
cal  techniques  to  bear  on  this  fascinating  problem. 
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My  own  career  as  a  crystallographer  corre¬ 
sponds  very  closely  with  the  development  of  direct 
methods  of  phase  determination.  In  fact  my  first 
exposure  to  crystallography  was  in  the  summer  of 
1949  when,  freshly  out  of  college,  I  had  a  tem¬ 
porary  job  in  the  laboratory  of  David  Harker  and 
John  Kasper,  who  had  recently  completed  the 
determination  of  the  structure  of  decaborane,  the 
first  structure  to  be  determined  ab  initio  from 
diffraction  data  alone.  I  was  an  interested  spec¬ 
tator  during  the  early  1950s,  when  the  work  of 
Herbert  Hauptman  and  Jerome  Karle  was  the 
subject  of  sometimes  bitter  controversy,  and  I 
have  a  particularly  vivid  memory  of  an  American 
Crystallographic  Association  meeting  that  was  held 
at  Harvard  in  the  spring  of  1954.  (I  can  be  ab¬ 
solutely  positive  about  the  date,  because  I  was 
working  at  the  time  at  Bell  Labs,  in  New  Jersey, 


while  my  fiancee  was  teaching  in  a  school  in  the 
Boston  suburbs  I  had  a  strong  incentive  to  get  to 
that  meeting )  The  program  at  this  meeting  had  a 
series  of  half  a  dozen  paper  whose  titles  were 
variations  on  the  theme  “Why  the  methods  pro¬ 
posed  by  Hauptman  and  Karle  won’t  work.”  These 
were  followed  by  a  paper  by  Clark.  Evans  and 
Christ,  of  the  U.S.  Geological  Survey,  entitled 
“The  Structure  of  Colemamte,  Solved  Using  the 
Methods  of  Hauptman  and  Karle.”  This  paper  did 
not  completely  silence  the  opposition  (I  remember 
also  a  rather  sharp  exchange  between  Jerry  Karle 
and  Michael  Woolfson,  who  was  later  to  become 
one  of  the  leaders  in  the  development  of  direct 
methods,  at  a  meeting  at  Cornell  in  1959),  but 
acceptance  of  the  ideas  of  direct  methods  had 
become  quite  general  by  the  early  1960s. 
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I  remember  well  as  a  student,  attending  the 
first  presentation  to  the  crystallographic  commun¬ 
ity,  by  Herb  Hauplman  and  Jerry  Karlc,  of  their 
ideas  on  solving  the  phase  problem.  1  believe  it 
was  an  American  Crystallographic  Association 
Meeting  at  the  University  of  Michigan.  We  were 
all  assembled  in  a  large  auditorium  and  as  Dr. 
Hauptman  has  stated,  the  presentation  was  quite 
mathematical.  At  the  completion  of  the  talks,  there 
was  a  moment  of  stunned  silence,  then  many 
hands  shot  up  tc  ask  questions,  1  thought.  Instead 
each  of  the  then  leading  lights  of  crystallography 
felt  obligated  to  reveal  their  own  brilliance  by 
putting  these  two  young  upstarts  on  their  place. 
They  began  to  criticize  the  methods  and  tried  to 
point  out  the  fallacy  in  the  Hauptman- Karlc  ap¬ 
proach,  During  this  heated  discussion,  my  major 
professor,  Dr.  Philip  Vaughan  leaned  over  and 
said  to  me  “these  guys  really  have  something", 
Phil  was  only  three  years  out  of  Cal  Tech  having 


worked  with  Linus  Pauling  and  then  worked  as  a 
postdoc  with  Eddie  Hughes.  Phil  later  went  on  to 
make  his  own  modest  contribution  to  ‘Direct 
Methods’  but  then  gave  up  what  surely  would 
have  been  a  brilliant  career  to  take  over  the  family 
geology  instruments  business. 

Much  later,  when  Herb  Hauptman  came  to  the 
Medical  Foundation  of  Buffalo,  his  initial  experi¬ 
mental  group  included  Bill  Duax,  who  worked  as 
a  postdoc  with  me,  and  Dave  Smith  my  first  Ph.D. 
student.  Later  my  second  Ph.D.  student,  Bob 
Blessing,  joined  the  group.  These  now  senior  level 
scientists,  along  with  the  other  bright  younger 
members  of  the  group,  have  solved  some  exceed¬ 
ingly  different  problems  in  biological  systems  as 
part  of  the  overall  effort  to  apply  ‘Direct  Meth¬ 
ods’  to  crystallographic  problems.  The  power  of 
the  method  is  still  being  developed  and  gives 
promise  of  revealing  to  us  the  intricate  secrets  of 
both  the  mineral  and  living  worlds. 


P169-7439/91/SO3.S0  ©  1991  -  Elsevier  Science  Publishers  B.V. 


■  Tutorial 


21 


Chemometrics  and  Intelligent  Laboratory  Systems,  10  (1991)21-43 
Elsevier  Science  Publishers  B  V„  Amsterdam 


An  introduction  to  receptor  modeling 

Philip  K.  Hopke 

Department  of  Chemistry,  Clarkson  University,  Potsdam,  A IY  13676  (U.S.A.)- 
(Received  8  November  1989;  accepted  15  February  1990) 

CONTENTS 

Abstract  . . . . . . . . . . . ....... .  21 

1  Introduction  . . . . . . . .  22 

2  Principle  of  mass  conservation  . . . . .  22 

3  Chemical  mass  balance  . . . . . . .  23 

3.1  Introduction . . . . .  23 

3.2  Previous  applications  . . . . . . .  23 

3  3  Illustrative  example  of  CMB  analysis . . . . .  26 

3.3.1  Initial  chemical  mass  balance  . . . . . . . . .  30 

3.3.2  Second  chemical  mass  balance . . . . . .  33 

3.3.3  Total  carbon  results . . . . . . .  34 

3.3.4  Conclusions  . . . . . . . .  34 

4  Multivariate  receptor  models  . . .  34 

4.1  Introduction  . . 34 

4  2  Mathematical  procedures . . . . . .  34 

4.3  Previous  applications . . . . . . . .  35 

4.4  Illustrative  example  . . . . . . .  38 

4.4.1  Data  description  . . . . . . .  38 

4.4.2  Results . . . . . . .  39 

5  Summary  . . . . . . . . . . .  40 

Acknowledgements  . . 41 

References  . . 41 

Abstract 

HopVc,  P.K..  1991.  An  introduction  to  receptor  modeling.  Chemometrics  and  Intelligent  Laboratory  S) stems,  10.  21-43. 

A  major  problem  facing  air  quality  management  personnel  is  the  identification  of  sources  of  airborne  particles  and  the 
quantitative  apportionment  of  the  aerosol  mass  to  those  sources.  The  ability  to  collect  particle  samples  and  analyze  these  sampler  for 
a  suite  of  elements  by  such  techniques  as  neutron  activation  analysis  or  X*ray  fluorescence  provides  the  data  for  the  problem  of 
resolving  a  senes  of  complex  mixtures  into  its  components  based  on  the  profiles  of  the  elements  emitted  by  the  various  sources  in  the 
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airshed.  If  all  of  the  sources  and  their  composition  profiles  are  known,  then  the  mass  balance  model  becomes  a  multiple  regression 
problem  If  a  senes  of  samples  have  been  analyzed  without  substantial  information  being  available  on  the  sources,  factor  analysis 
methods  can  be  employed  In  both  situations,  the  analysis  is  complicated  by  higher  levels  of  measurement  error  in  these  analyses  than 
in  typi'-al  spectrochcmical  problems.  In  addition,  the  source  profiles  can  vary  as  the  composition  of  input  materials  for  the  emission 
sources  change  in  time  Thus,  there  are  limitations  to  the  ability  of  statistical  methods  to  resolve  sources  in  real  world  problems  The 
physical  and  statistical  basis  of  these  methods  and  their  application  to  representative  problems  will  be  reviewed 


1  INTRODUCTION 

The  advent  of  a  U.S.  national  ambient  air 
quality  standard  for  total  suspended  particles 
(TSPs)  in  the  early  1970s  created  the  need  to 
identify  particle  sources  so  that  effective  control 
strategies  could  be  designed  and  implemented. 
The  initial  efforts  at  identification  of  particle 
sources  focused  on  dispersion  models  of  point 
sources  and,  in  most  cases,  resulted  in  substantial 
reductions  in  TSP  levels.  However,  as  the  incre¬ 
ment  of  additional  control  needed  to  reach  stan¬ 
dard  levels  became  smaller,  the  model  uncertain¬ 
ties  led  to  difficulties  in  identifying  the  actual 
sources  of  continuing  problems.  In  addition,  fugi¬ 
tive  and  other  non-ducted  emissions  are  generally 
not  treated  or  are  poorly  handled  in  these  models. 
Thus,  additional  methods  were  required  to  iden¬ 
tify  and  quantitatively  apportion  particle  mass  to 
sources  These  now  methods  are  called  receptor 
models.  In  them,  the  measured  properties  of  the 
collected  ambient  samples  are  used  to  infer  the 
contributions  of  the  sources  to  the  ambient  pollu¬ 
tant  concentration  These  methods  require  that 
samples  be  obtained  at  locations  of  interest,  recep¬ 
tor  sites,  and  that  the  samples  so  collected  be 
analyzed  for  the  properties  that  arc  characteristic 
of  the  pollutant  sources, 

These  requirements  have  arisen  at  a  time  when 
new  analytical  methods  have  been  developed  that 
permit  multielcmcntal  analysis  of  large  numbers 
of  airborne  particle  samples  or  microscopic  char¬ 
acterization  of  large  numbers  of  individual  par¬ 
ticles.  Thus,  large  data  bases  on  the  composition 
of  airborne  particles  are  available  for  use  in  these 
receptor  models.  Although  much  of  the  thrust  of 
the  model  developments  have  been  aimed  at  iden¬ 
tification  of  sources  of  particle  mass,  they  also  can 
be  used  to  elucidate  the  origins  of  the  various 
measured  species  observed  in  the  samples.  It  then 


becomes  possible  to  quantitatively  apportion  the 
observed  airborne  concentrations  such  as  airborne 
lead  among  the  various  source  types. 

The  importance  of  receptor  models  as  air  qual¬ 
ity  management  tools  in  the  U.S.  has  recently 
been  substantially  increased  by  the  promulgation 
of  a  new  ambient  air  quality  standard  for  par¬ 
ticulate  matter.  This  new  standard  requires  all  of 
the  state  and  local  air  quality  planning  agencies  to 
revise  their  plans  for  improving  air  quality  and 
reducing  the  particulate  level  concentrations  where 
they  are  expected  to  exceed  the  prescribed  levels 
In  the  associated  guidance  documents  provided  by 
the  U.S.  Environmental  Protection  Agency  (lj, 
receptor  models  are  explicitly  approved  for  use  in 
this  planning  process  along  with  the  traditional 
dispersion  models.  Thus,  receptor  models  have 
now  become  an  accepted  part  of  the  regulatory 
process  for  air  quality  management. 

This  paper  will  outline  several  of  the  applicable 
models,  provide  examples  of  their  use  in  appor¬ 
tioning  materials  in  a  number  of  airsheds,  and 
demonstrate  how  they  can  identify  the  influence 
of  emissions  on  the  overall  airborne  particle  con¬ 
centrations. 


2  PRINCIPLE  OF  MASS  CONSERVATION 

All  of  the  currently  used  receptor  models  arc 
based  on  the  assumption  of  mass  conservation 
and  the  use  of  a  mass  balance  analysis.  For  exam¬ 
ple,  let  us  assume  that  the  total  airborne  par¬ 
ticulate  lead  concentration  (ng/rn^)  measured  at  a 
site  can  be  considered  to  be  the  sum  of  contribu¬ 
tions  from  independent  source  types  such  as  mo¬ 
tor  vehicles,  incinerators,  smelters,  etc. 

■’t’r-  Pt>,„,„+  Pb,*,.  +  Pbu.jcll„+ ...  (1) 
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However,  a  motor  vehicle  burning  leaded  gasoline 
emits  particles  containing  materials  other  than 
lead.  Therefore,  the  atmospheric  concentration  of 
lead  from  automobiles  in  ng/cm3,  PbauI0,  can  be 
considered  to  be  the  product  of  two  cofactors:  the 
gravimetric  concentration  (ng/mg)  of  lead  in  au¬ 
tomotive  particulate  emissions,  flpi,4utC).  and  the 
mass  concentration  (mg/m3)  of  automotive  par¬ 
ticles  in  the  atmosphere,  /aut0. 

^kauto  **  ^Pb.autoAuto  (2) 

The  normal  approach  to  obtaining  a  data  set  for 
receptor  modeling  is  to  determine  a  large  number 
of  chemical  constituents  such  as  elemental  con¬ 
centrations  in  a  number  of  samples.  The  mass 
balance  equation  can  thus  be  extended  to  account 
for  all  m  elements  in  the  n  samples  as  contribu¬ 
tions  from  p  independent  sources 

p 

X„°  £  «,(/(,  1-l.m  ;=l.n  (3) 

A-l 

where  xtJ  is  the  ith  elemental  concentration  mea¬ 
sured  in  the  / th  sample,  a,k  is  the  gravimetric 
concentration  of  the  i th  element  in  material  from 
the  Ath  source,  and  fkJ  is  the  airborne  mass 
concentration  of  material  from  the  Ath  source 
contributing  to  the  j  th  sample.  There  are  several 
different  approaches  to  receptor  model  analysis 
that  have  been  successfully  applied  including 
chemical  mass  balance  (CMB)  and  multivariate 
receptor  models  including  principal  components 
analysis  and  target  transformation  factor  analysis 
(TTFA).  These  models  can  be  applied  to  both 
particulate  and  gaseous  species.  The  basis  for  each 
of  these  methods  will  be  presented  in  subsequent 
sections  of  this  paper  along  with  examples  of  their 
application  to  the  identification  of  pollution 
sources  in  the  atmosphere. 


3  CHEMICAL  MASS  BALANCE 
3.1  Introduction 

The  chemical  mass  balance  (CMB)  sometimes 
called  the  chemical  element  balance  solves  eq.  (3) 
directly  for  each  sample  by  assuming  that  the 


number  of  sources  and  their  compositions  at  the 
receptor  site  are  known.  This  approach  was  first 
independently  suggested  'by  Winchester  and 
Nifong  (2)  and  by  Miller  et  al.  [3).  The  composi¬ 
tion  of  an  ambient  sample  is  then  used  in  a 
multuple  linear  regression  against  source  composi¬ 
tions  to  derive  the  mass  contribution  of  each  source 
to  that  particular  sample.  Miller  et  al.  (3}  modified 
eq.  (3)  to  explicitly  include  changes  in  composi¬ 
tion  of  the  source  material  while  in  transit  to  the 
receptor 
p 

A- 1 

where  aA  is  the  coefficient  of  fractionation  so  that 
if  a'k  were  the  composition  of  the  particles  as 
emitted  by  the  source,  aa  is  the  composition  of 
the  particles  at  the  receptor  site  (a,,  =  In 

practice,  it  is  generally  impossible  to  determine 
the  a„  values  and  they  arc  assumed  to  be  unity 
<««  =  «, a)- 

J.2  Previous  applications 

Early  applications  of  this  approach  to  urban 
aerosol  mass  apportionment  included  Pasadena, 
CA  (4],  Heidelberg,  Germany  |5),  Ghent,  Belgium 
(6),  and  Chicago.  IL  |7).  In  all  of  these  analyses, 
the  quality  of  available  source  compositions 
severely  limited  the  precision  to  which  the  am¬ 
bient  compositions  could  be  reproduced. 

Several  major  research  efforts  have  subse¬ 
quently  resulted  in  substantially  better  source  data. 
The  source  emission  studies  led  to  much  improved 
resolution  of  the  particle  sources  in  Washington, 
DC  J8.9J.  In  one  of  these  studies,  Kowalczyk  et  al. 
(8]  used  a  weighted  least-squares  regression  analy¬ 
sis  to  fit  6  sources  with  8  elements  for  10  ambient 
samples.  In  these  analyses,  the  ambient  elemental 
concentrations  are  weighted  by  the  inverse  of  the 
square  of  the  analytical  uncertainty  in  that  mea¬ 
surement. 

Subsequently,  Kowalczyk  et  al.  (9)  examined 
130  samples  using  7  sources  with  28  elements 
included  in  the  fit.  Although  28  elements  were 
used  in  the  fitting  process,  the  fit  did  not  change 
appreciably  with  varying  numbers  of  elements  in- 
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eluded  with  (he  exception  of  some  of  the  key 
tracer  elements  such  as  Na,  Pb,  and  V,  Cheng  and 
Hopke  (10)  have  recently  examined  these  data 
using  a  variety  of  regression  diagnostics.  They 
found  that  these  ‘marker’  elements  can  be  clearly 
identified  and  their  influence  on  the  quality  of  the 
fit  to  the  ambient  data  and  the  source  mass  contri¬ 
butions  can  be  quantitatively  estimated. 

The  elemental  balance  sheet  allows  the  identifi¬ 
cation  of  the  major  sources  of  metals  in  the  air. 
For  example,  vanadium  and  nickel  primarily  arise 
from  oil-fired  power  plant  emissions;  23  of  25 
ng/m3  for  V  and  4.0  of  17  ng/m3  for  Ni  with 
most  of  the  nickel  unexplained.  Subsequent  stud¬ 
ies  have  shown  that  Kowalczyk  et  al.  (9)  used  an 
unusually  low  Ni/V  ratio  for  the  oil  power  plant 
profile  which  led  to  the  underprediction  of  Ni. 
Zinc  is  mainly  released  by  incinerator  sources  but 
also  comes  from  motor  vehicles  (51  ng/m3  from 
refuse  incinerations  and  7.3  ng/m3  from  motor 
vehicles).  The  reverse  is  true  for  lead  with  motor 
vehicles  as  the  primary  source  and  refuse  incinera¬ 
tion  as  a  lesser  but  important  source,  428  ng/m3 
from  motor  vehicles  and  34  ng/m3  from  incinera¬ 
tion.  In  this  way  sources  of  both  particulate  mass 
and  specific  elements  can  be  identified. 

Mayrsohn  and  Crabtree  (11)  presented  the  use 
of  an  iterative  least-squares  approach  to  apportion 
6  sources  of  airborne  hydrocarbon  compounds. 
The  sources  were  automotive  exhaust,  volatiliza¬ 
tion  of  gasoline  and  release  of  gasoline  vapor, 
commercial  natural  gas,  geological  natural  gas, 
and  liquefied  petroleum  gas.  They  performed  the 
least-squares  fit  to  the  hydrocarbon  compound 
concentrations  using  gas  chromatography  to  de¬ 
termine  the  concentrations  of  eight  compounds. 
Their  ordinary  least-squares  source  reconciliation 
algorithm  recognized  that  not  all  sources  may 
contribute  to  every  sample,  and,  if  negative  contri¬ 
butions  were  obtained,  a  different  configuration  of 
sources  was  employed  with  certain  qualifying  as¬ 
sumptions  (12).  Each  possible  configuration  with 
positive  coefficients  was  considered  and  the  one 
with  the  lowest  standard  error  was  chosen  as  the 
optimum  solution.  On  the  average,  automotive 
exhaust  was  the  source  of  almost  50%  of  observed 
hydrocarbons.  Gasoline  and  its  vapor  contributed 
30-30%  by  weight  and  the  balance  resulted  from 


commercial  and  geological  natural  gas.  Thus,  au¬ 
tomobiles  and  other  highway-related  sources  were 
responsible  for  the  majority  of  these  hydro¬ 
carbons. 

A  similar  study  utilizing  this  mass  balance  ap¬ 
proach  for  resolving  hydrocarbon  sources  has  been 
made.  Nelson  et  al  (13)  have  examined  the  at¬ 
mospheric  hydrocarbons  in  Sydney,  Australia. 
They  used  a  much  more  extensive  hydrocarbon 
profiles  for  their  sources  and  have  obtained  good 
agreement  between  the  mass  balance  approach 
and  a  resolution  based  on  an  emission  inventory. 
They  also  found  that  the  major  hydrocarbon 
sources  were  direct  automobile  exhaust  (36  ±  4%) 
and  evaporative  emissions  of  gasoline  (32  ±  4%) 
Thus,  it  was  possible  to  identify  the  impact  of 
highway  emissions  on  gaseous  as  well  as  par¬ 
ticulate  pollutants. 

In  1979,  Watson  (14)  and  Dunker  (15)  indepen¬ 
dently  suggested  a  mathematical  formalism  called 
effective  variance  weighting  that  included  the  un¬ 
certainty  in  the  measurement  of  the  source  com¬ 
position  profiles  as  well  as  the  uncertainties  in  the 
ambient  concentrations  As  part  of  this  analysis,  a 
method  was  also  developed  to  permit  the  calcula¬ 
tion  of  the  uncertainties  in  the  mass  contributions. 
Effective-variance  least  squares  has  been  incorpo¬ 
rated  into  the  standard  personal  computer  soft¬ 
ware  developed  by  the  U.S.  EPA  for  receptor 
modeling. 

The  most  extensive  use  of  effective-variance 
Fitting  has  been  made  by  Watson  and  co-workers 
(14,16)  in  their  work  on  data  from  Portland.  OR. 
Since  that  study,  a  number  of  other  applications 
of  this  approach  have  been  made  including  Med¬ 
ford.  OR  (17).  Philadelphia,  PA  (18,19),  and  at  a 
number  of  locations  in  the  U.S.  Environmental 
Protection  Agency’s  Inhalable  Particulate  Net¬ 
work  (20). 

It  must  be  made  clear,  however,  that  the  CMB 
analysis  works  well  in  these  examples  because 
both  the  source  and  ambient  samples  were  col¬ 
lected  and  analyzed  during  the  same  time  period. 
A  much  less  detailed  resolution  of  lead  sources 
was  all  that  was  possible  in  Kellogg.  ID  (21)  when 
on-site  samples  could  not  be  obtained.  In  an  inler- 
comparison  study  organized  by  the  U.S.  Environ¬ 
mental  Protection  Agency  (22)  to  examine  recep- 
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tor  models,  a  set  of  ambient  particulate  elemental 
compositional  data  sets  were  analyzed  by  a  num¬ 
ber  of  investigators  using  similar  CMB  methods. 
The  compositions  of  particles  from  sources  m 
Houston,  TX,  were  not  available  and  were  not 
measured  during  this  program  so  that  source  com¬ 
position  profiles  had  to  be  obtained  from  litera¬ 
ture  sources.  The  lack  of  source  data  immediately 
raised  problems  m  the  use  of  the  mass  balance 
methods  and  comparison  of  results  from  different 
investigators  (22j.  It  is  not  always  certain  exactly 
which  sources  should  be  included  in  the  analysis. 
Although  emission  inventories  may  be  available 
for  the  region,  it  may  be  that  the  measured  source 
composition  for  a  coal-fired  power  plant  in  Mary¬ 
land  burning  eastern  bituminous  coal  is  not  a 
particularly  good  representation  for  a  lignite-burn¬ 
ing  plant  in  Texas. 

An  additional  problem  for  receptor  modeling  is 
that  the  motor  vehicle  profile  in  the  United  States 
is  undergoing  rapid  changes  in  lead  and  bromine 
concentrations  with  time  as  the  new,  catalyst- 
equipped  cars,  diesel  cars  and  trucks  replace  the 
remaining  leaded-fuel  burning  vehicles.  An  inter¬ 
esting  solution  to  the  problem  of  the  changing 
lead  concentration  m  motor  vehicle  emissions  was 
recently  provided  by  Dzubay  et  al.  (19).  They 
obtained  particle  samples  in  the  summer  of  1982 
in  Philadelphia,  PA  and  vicinity  in  the  size  ranges 
of  <  2.5  jun  and  2.5-10  pm  using  a  dichotomous 
sampler.  The  samples  were  analyzed  using  ion 
chromatography  for  sulfate  and  nitrate.  X-ray  flu¬ 
orescence  (XRF)  and  instrumental  neutron  activa¬ 
tion  analysis  (INAA)  for  elemental  composition, 
and  a  thermo-optical  method  for  organic  and  ele¬ 
mental  carbon.  Because  there  is  also  a  non-ferrous 
metal  smelter  in  the  airshed,  lead  in  the  air  comes 
from  incinerators,  the  smelter,  and  tailpipe  emis¬ 
sions.  Using  the  other  measured  species  in  the 
data  set,  they  derived  the  amounts  of  lead  that 
could  be  attributed  to  all  sources  other  than  motor 
vehicles.  They  then  used  a  second  multiple  regres¬ 
sion  analysis  to  relate  the  amount  of  unaccounted 
lead,  total  lead  minus  all  sources  other  than 
vehicles,  to  the  motor  vehicle  source  and  obtained 
a  lead  value  of  6%  lead  in  motor  vehicle  emissions. 
It  appears  that  as  long  as  sufficient  leaded  fuel  is 
still  in  use,  it  will  be  possible  to  employ  an  ap¬ 


proach  such  as  this  one  to  obtain  the  current 
fleet-weighted  average.  With  leaded  fuel  having 
been  phased  out  entirely,  the  lead  and  bromine  are 
no  longer  useful  tracers  for  motor  vehicles  [23].  A 
similar  trend  will  now  be  starting  in  Europe  as 
lead  concentrations  are  reduced  during  the  next 
few  years 

Since  motor  vehicles  are  an  important  source  of 
particles,  it  is  helpful  to  know  that  there  may  be 
other  tracers  appearing  for  automobiles.  As  part 
of  the  Philadelphia  study  discussed  above,  Olmez 
and  Gordon  (24}  identified  unusually  high  values 
of  the  rare  earth  elements  lanthanum,  cerium,  and 
samanum  arising  from  the  catalysis  support 
material  from  an  oil  refinery.  It  is  likely  that 
similar  materials  arise  from  the  catalytic  con¬ 
verters  in  automobiles  and  could  serve  as  new 
markers  for  tailpipe  emissions. 

The  results  from  Mayrsohn  and  Crabtree  {11} 
and  Nelson  ct  al.  (13}  suggest  that  a  mass  balance 
is  applicable  for  the  gaseous  aliphatic  hydro¬ 
carbons.  These  species  along  with  CO  could  possi¬ 
bly  provide  good  tracers  for  particulate  emissions 
from  highways.  Such  a  result  is  less  likely  to  be 
obtained  for  more  reactive  species  like  olefins. 
There  will  be  problems  for  semi-voiatile  species 
like  polycyclic  aromatic  hydrocarbons  (PAHs)  be¬ 
cause  of  the  partitioning  of  the  species  between 
the  gaseous  and  particulate  phases.  This  problem 
has  been  recently  reviewed  by  Pankow  (25).  The 
sampling  and  analysis  problems  of  reactive  hydro¬ 
carbons  and  the  modeling  needed  to  account  for 
their  reactions  in  transit  from  source  to  receptor 
makes  it  very  difficult  to  perform  accurate  recep¬ 
tor  modeling  and  is  an  area  of  study  that  requires 
considerable  additional  effort. 

There  are  alternative  approaches  to  solving  eq, 
(3).  For  example,  it  can  be  restated  as  a  linear 
programming  problem.  Cheng  and  Hopke  (26} 
have  examined  the  use  of  the  norm  and  linear 
programming  approaches  suggested  by  Hoagland 
(27),  Henry  ct  al.  (28),  and  Henry  (29).  Cheng  and 
Hopke  (26)  found  that  a  weighted,  constrained 
£|-norm  approach  was  much  more  stable  that 
either  ordinary  weighted  least-squares  or  effective- 
variance  weighted  least-squares  methods  at  least 
for  the  set  of  three  data  sets  created  for  the  EPA 
Receptor  Model  Intercomparison  Workshop. 
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These  data  sets  are  described  in  detail  by  Currie  et 
al.  (30). 

These  same  EPA  data  sets  have  also  been  re¬ 
analyzed  using  non-negative,  weighted  least- 
squares  methods.  In  these  studies,  Wang  and 
Hopke  [31]  concluded  that  these  methods  do  pro¬ 
vide  valuable  analysis  of  the  rank  of  the  source 
profile  matrix  and  physically  meaningful  non¬ 
negative  mass  contnbutions.  However,  they  sug¬ 
gest  that  the  methods  might  lead  to  incorrect 
results  if  the  proper  source  profiles  are  not  used  in 
the  fitting  process.  Thus,  there  are  statistical  meth¬ 
ods  that  are  useful  for  extracting  estimates  of  the 
mass  contributions  when  both  the  source  profiles 
and  the  ambient  concentrations  are  known.  How¬ 
ever,  it  is  often  the  case  that  the  measured  profiles 
are  too  similar  to  one  another  to  be  successfully 
resolved.  Thus,  other  methods  are  needed  to  in¬ 
crease  the  amount  of  information  available  about 
the  source  and  ambient  particles, 

This  other  method  is  computer-controlled  scan¬ 
ning  electron  microscopy  (CCSEM).  The  analysis 
of  microscopic  features  of  individual  particles, 
such  as  their  chemical  composition,  will  provide 
much  more  information  from  each  sample  than 
can  be  obtained  from  bulk  analysis.  Therefore,  the 
ability  to  perform  microscopic  analyses  on  a  num¬ 
ber  of  samples  permits  the  use  of  CCSEM  tech¬ 
niques  in  receptor  models.  CCSEM  is  an  extension 
of  individual  particle  characterization  by  optical 
microscopy  and  scanning  electron  microscopy 
(SEM).  The  microscope  has  long  been  employed 
to  determine  those  characteristics  or  features  that 
arc  too  small  to  be  detected  by  the  naked  eye.  The 
use  of  optical  microscopy  in  receptor  models  has 
been  described  by  Crutcher  (32).  Optical  micro¬ 
scopic  investigation  of  particle  samples  and  its 
application  to  source  apportionment  have  been 
illustrated  in  detail  by  Hopke  (33).  The  ability  of 
the  scanning  electron  microscope  equipped  with 
X-ray  detection  capabilities  (SF.M/XRF  system) 
to  provide  size  shape,  and  elemental  constitution 
data  extends  the  utility  of  microscopic  examina¬ 
tions.  For  example,  several  studies  have  used  the 
SEM  in  analysis  of  samples  of  coal- fired  power 
plant  ash  (34,35)  and  volcanic  ash  (36).  However, 
these  studies  are  limited  in  the  number  of  particles 
detected,  since  SEM  has  the  disadvantage  of  being 


time-consuming  to  examine  particles  manually 

CCSEM  can  provide  an  important  additional 
method  in  the  area  of  receptor  modeling.  Casuccio 
et  al.  (37)  and  Hopke  (33)  have  surveyed  the  initial 
applications  of  CCSEM  in  the  particle  elemental 
investigation  and  its  ability  of  identifying  particle 
sources  in  the  receptor  model  studies.  A  number 
of  previous  studies  have  shown  that  CCSEM  is 
capable  of  detecting  the  characteristics  of  individ¬ 
ual  particles  (38,39).  The  significant  improvement 
of  CCSEM  is  the  coupling  of  a  computer  to  con¬ 
trol  the  SEM.  Hence,  three  analytical  tools  are 
under  computer  control  in  the  CCSEM:  (1)  the 
SEM,  (2)  the  energy  dispersive  spectrometry  X-ray 
analyzer,  and  (3)  the  digital  scan  generator  for 
image  processing  (37).  CCSEM  rapidly  examines 
individual  particles  in  samples  and  provides  their 
elemental  constitutions  as  well  as  their  aerody¬ 
namic  diameter  and  shape  factors.  Based  on  these 
characteristics  of  each  particle,  particles  can  be 
assigned  to  a  number  of  well  defined  classes. 
These  particle  classes  become  the  basis  for  char¬ 
acterizing  sources  so  that  accurate  particle  classifi¬ 
cation  becomes  a  key  step  in  using  CCSEM  data 
in  receptor  modeling. 

The  approach  to  the  particle  classification  can 
be  accomplished  by  agglomerative,  hierarchical 
cluster  analysis  along  with  rule-building  expert 
systems.  The  particles  with  similar  composition 
are  grouped  by  the  cluster  analysis.  The  sample- 
to-sample  difference  will  be  clearly  distinguished 
by  comparing  cluster  patterns  of  samples.  More¬ 
over,  it  is  assumed  that  a  source  emits  various 
types  of  particles.  However,  the  mass  fractions  of 
particles  in  the  various  particle  classes  will  be 
different  from  source  to  source  and  are  the 
fingerprint  for  that  source.  The  rule-building  ex¬ 
pert  system  can  help  automate  the  particle  class 
assignments.  This  idea  has  been  confirmed  by  the 
successful  work  on  the  samples  collected  in  El 
Paso,  TX  (40)  and  particles  from  a  coal  fired 
power  plant  (41).  CCSEM  analysis  of  individual 
particles  can  apportion  the  mass  of  particles  to 
different  sources  in  the  airshed. 

3.3  Illustrative  example  of  CMB  analysis 

To  illustrate  the  use  of  the  CMB  method,  an 
example  will  be  taken  from  the  study  of  Glover  et 
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al  [421  of  the  sources  of  airborne  particulate  matter 
in  Granite  City,  IL.  With  the  promulgation  of  the 
new  National  Ambient  Air  Quality  Standards  for 
Particulate  Matter  —  10  pm  (PMl0)  —  it  has 
been  necessary  to  review  the  State  Implementation 
Plan  (SIP)  m  each  state  for  those  areas  most  likely 
to  be  out  of  compNnce  with  the  new  standard.  In 
Illinois,  one  such  area  is  Granite  City,  an  in¬ 


dustrial  city  northeast  of  St.  Louis,  MO,  that  has  a 
history  of  total  suspended  particulate  and  airborne 
lead  non-attainment. 

The  locations  of  the  major  industries  in  Granite 
City  and  that  of  the  ambient  airborne  particulate 
sampler  are  shown  m  Fig  1,  the  local  industries 
include  steel  mills  (American  Steel  and  Granite 
City  Steel),  a  secondary  lead  smelter  (Terracorp), 


Fig.  I.  Granite  City  local  point  sources. 
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an  aluminum  smelter  (SCI)  and  a  chemical  plant 
(Jenmson  Wright).  There  is  also  a  U.S.  Army 
Corps  of  Engineers  storage  facility  located  at  the 
edge  of  town.  Fig.  2  shows  the  location  of  the 
major  industries  in  the  greater  St.  Louis  Metro¬ 
politan  area  and  their  location  relative  to  the 
ambient  airborne  particulate  sampler. 

As  a  part  of  the  studies  necessary  to  prepare  an 
effective  and  efficient  SIP,  receptor  modeling  has 
been  applied  to  elemental  compositional  data  for 
24  h  airborne  particle  samples  taken  m  Granite 
City  by  the  Illinois  Stale  Water  Survey  using  an 
automated  dichotomous  sampler.  This  sampler 
collects  particles  with  an  inlet  that  excludes  large 
particles  by  having  a  50%  transmission  efficiency 
for  10  pm  particles.  The  particles  that  penetrate 
into  the  sampler  are  separated  into  two  aerody¬ 
namic  size  fractions,  <  2.5  pm  (fine)  and  2.5-10 
pm  (coarse).  The  particles  are  collected  on  Teflon 
filters  which  are  then  available  for  chemical  analy¬ 
sis. 


The  particle  samples  were  subjected  to  both 
XRF  and  INAA  in  order  to  provide  the  input  data 
for  receptor  modeling,  48  sample  pairs  (fine  and 
coarse)  were  thus  analyzed  for  33  elements.  Each 
of  these  samples  were  then  subjected  to  two  CMB 
analyses.  For  the  first  analysis,  the  source  profiles 
were  taken  from  libraries  available  in  the  litera¬ 
ture.  To  supplement  the  source  profiles  available 
in  the  literature,  12  dust  samples  were  collected  in 
and  around  Granite  City,  IL.  These  were  aeroso¬ 
lized,  sampled,  and  analyzed  by  XRF  and  INAA 
to  provide  site  specific  source  profiles  for  the 
second  CMB  analysis. 

In  an  attempt  to  account  for  more  of  the  mass 
on  each  ambient  filter,  total  carbon  was  measured 
seven  times  during  the  ambient  sampling  period. 
A  Sierra  PA/,0  sampler  equipped  with  quartz  fiber 
filters  was  collocated  with  the  dichotomous  sam¬ 
pler  for  this  purpose.  Each  quartz  filter  was 
analyzed  for  total  PA/10  mass  and  total  carbon 
mass.  After  the  PMX0  mass  of  each  filter  was 
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determined,  the  filter  was  treated  with  HC1  to 
remove  any  carbonate.  Each  filter  was  then 
oxidized  at  800 °C,  converting  the  elemental  and 
organic  carbon  to  C02.  The  amount  of  C02  re¬ 
leased  was  measured  with  a  Dohrmann  carbon 
analyzer.  A  linear  regression  was  used  to  relate  the 
mass  of  total  carbon  to  the  total  PMV o  mass  of 
each  quartz  filter.  This  regression  is  repiesented 
by 

TC«  0.074  x/Wl0  +  3.129  (5) 

where  TC  and  PMX0  are  both  measured  in  jig/m*. 

CCSEM  (37)  was  used  to  partition  the  total 
carbon  measurements  between  the  fine  and  coarse 
fractions.  The  first  and  last  quartz  filters  collected 
were  analyzed  by  CCSEM.  The  number  distribu¬ 
tion,  physical  mass  distribution,  and  aerodynamic 


mass  distribution  of  the  particles  on  each  filter 
were  determined  along  with  an  elemental  analysis 
of  the  particles.  The  CCSEM  measurements  de¬ 
termined  that  the  total  carbon  was  apportioned 
between  the  fine  and  coarse  fractions  by 

TCUnc  -  0.919  X  TC  (6) 

TCcmie  =  0.082  X  TC  (7) 

The  PMl0  mass  on  each  of  the  quartz  filters  was 
scaled  to  the  PMX0  mass  collected  on  the  Teflon 
disks.  The  mass  of  each  pair  of  fine  and  coarse 
Teflon  disks  was  added  to  find  the  total  PMX 0 
mass  on  the  Teflon  disks.  TCUm  and  TCcoar4C  for 
the  Teflon  disks  were  found  by  multiplying  the 
scaling  factor  for  each  sample  with  eqs.  (6)  and 
(7),  respectively. 
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3.3.1  Initial  chemical  mass  balance 
The  initial  CMB  analysts  identified  several 
sources  of  particulate  material  in  the  Granite  City 
area.  Figs  3  and  4  show  the  identified  source 
types  for  the  fine  and  coarse  fraction  and  the 
direction  of  each,  relative  to  the  sampler,  based  on 
the  average  wind  direction  during  the  time  of 
sample  collection  Fig,  3  shows  the  regularity  of 
the  limestone  and  regional  sulfate  contributions  to 
the  fine  fraction.  Motor  vehicle  emissions  were 
also  observed  to  be  coming  from  the  highway  to 
the  north.  Besides  these  fugitive  and  non-point 
sources,  me  local  steel  plants  and  lead  smelters 
were  observed  to  be  major  emission  sources.  Fugi¬ 
tive  emissions  from  Granite  City  Steel  appear  as 
the  urban  dust  coming  from  the  southeast.  The 
zinc  source  to  the  east  is  the  galvanizing  oper¬ 


ations  at  Granite  City  Steel.  This  source  is  located 
to  the  west  of  the  International  Mill  Service  com¬ 
plex  in  Fig.  1.  The  coal-fired  power  plant  identi¬ 
fied  to  the  east  is  Granite  City  Steels’  coking 
operations  while  Taracorp’s  furnaces  are  the  power 
plant  identified  to  the  southwest.  Among  the  more 
distant  source  identified  was  a  fertilizer  plant 
located  5  km  to  the  south  of  the  sampler.  The 
refinery  complex  15  km  to  the  north  and  the 
copper  smelter  15  km  to  the  south  also  appeared 
in  the  initial  CMB  analysis  results.  The  coal-fired 
power  plant  that  was  identified  to  the  north  of  the 
sampler  is  probably  the  facility  located  between 
the  Mississippi  and  Missouri  Rivers  since  there 
are  no  local  sources  with  similar  characteristics  in 
that  direction  while  the  oil  combustion  sourcc(s) 
to  the  southwest  are  the  two  oil-fired  power  plants 


Fig,  4,  Identified  sources  of  coarse  fraction  material. 
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in  that  direction.  The  zinc  smelter  12  km  south  of  east  and  southwest,  respectively.  The  coke  pile(s) 
the  sampler  was  expected  to  be  a  major  source  of  identified  to  the  west  are  at  American  Steel  or 

fine  zinc.  However,  the  current  study  did  not  find  Taracorp.  American  Steel  is  identified  by  the  zinc 

appreciable  amounts  of  zinc  coming  from  the  source  to  the  southwest  and  the  coal  sources  to  the 
south.  west.  The  oil  source  to  the  northwest  is  the  chem- 

Fig.  4  shows  the  predominance  of  the  limestone  ical  treatment  facility  for  railroad  ties  at  Jenmson 
and  urban  dust  in  the  coarse  fraction  along  with  Wright. 

the  local  steel  and  lead  sources.  Besides  the  metal  The  initial  CMB  analysis  results  show  that  the 

emissions  from  the  steel  plant,  the  coking  oper-  composition  of  air  pollution  in  the  St  Louis  area 

ations  at  Granite  City  Steel  appear  as  a  combina-  has  changed  over  the  last  ten  years.  Only  one  fifth 

lion  of  the  sulfate  emissions  and  coal-fired  power  of  the  fine  profiles  and  one  fourth  of  the  coarse 

plant  profile.  As  it  was  found  in  the  fine  fraction,  profiles  used  in  the  first  CMB  analysis  were  taken 

the  galvanizing  operation’s  zinc  emissions  and  the  from  the  profiles  derived  from  the  1975  to  1977 

lead  smelter’s  combustion  source  appear  to  the  RAPS  results  These  profiles  accounted  for  11  and 


TABLE I 


R2  adjusted  for  degrees  of  freedom 


Sample 

Fine  fraction  values 

Coarse  fraction  values 

Initial 

Final 

Change 

Initial 

Final 

Change 

03/09/S6 

0978 

0  976 

-0002 

0812 

0811 

-0001 

03/17/S6 

0981 

0998 

0017 

0.959 

0995 

0036 

03/22/S6 

0919 

0921 

0002 

0.947 

0  992 

0045 

03/25/86 

0  981 

0  985 

0004 

06S3 

0837 

0.154 

04/15/86 

0.983 

0967 

-0016 

0683 

0837 

0.154 

04/18/86 

0.933 

0993 

0060 

0810 

0  970 

0160 

01/21/86 

0.949 

0.956 

0007 

0.941 

0.950 

0009 

05/23/86 

0  891 

0926 

0035 

0878 

0983 

0,105 

05/23/86 

0947 

0960 

0013 

0.797 

0982 

0.185 

05/25/86 

0.957 

0.964 

0007 

0862 

0857 

-0005 

05/26/86 

0.979 

0  990 

0011 

0670 

0  867 

0.19/ 

07/24/86 

0.951 

0.981 

0030 

0.981 

0991 

0010 

08/05/86 

0878 

0.950 

0072 

0  975 

0.990 

0015 

08/10/86 

0946 

0.940 

-0006 

0.968 

0  990 

0022 

10/18/86 

0.991 

0958 

-0033 

0  852 

0.956 

0.104 

10/23/86 

0800 

0  871 

0071 

0931 

0  971 

0040 

10/28/86 

0949 

0964 

0015 

0970 

0995 

0025 

11/10/86 

0.929 

0  895 

-0034 

0.947 

0.994 

0047 

11/11/86 

0965 

0.967 

0002 

0.972 

0.969 

-0003 

12/03/86 

0  812 

0  848 

0036 

0.972 

0971 

-0001 

12/07/86 

0.802 

0  843 

0  041 

0  805 

0.969 

0164 

01/29/87 

0,985 

0.965 

-0020 

0618 

0619 

0001 

02/01/87 

0  947 

0.972 

0025 

0.766 

0822 

0  056 

05/04/87 

0.991 

0992 

0001 

0S38 

0.974 

0.136 

05/23/87 

0.959 

0.988 

0029 

0999 

0.998 

-0001 

05/25/87 

0852 

0  857 

0005 

0.742 

0  827 

0.085 

06/06/87 

0979 

0.990 

0011 

0.911 

0955 

0044 

06/12/87 

0.985 

0.983 

-0002 

0.945 

0.9J0 

0005 

Average 

Avg,  gain 

0.936 

0  950 

0023 

0  875 

0.935 

0073 

AvgJoss 

-0016 

-0002 
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,  5.  Predicted  mass  fraction  of  selected  fine  fraction  CMB  Results,  March  1986-June  1987,  Granite  City,  IL 
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Fig.  6  Predicted  mass  fraction  of  selected  coarse  fraction  CMB  Results,  March  1986-June  1987,  Granite  City,  IL, 
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20%  of  all  of  the  identified  fine  and  coarse  mass, 
respectively.  The  remaining  profiles  used  in  the 
current  work  were  taken  from  more  recent  pollu¬ 
tion  source  studies  at  various  sites  throughout  the 
U.S. 

3.3.2  Second  chemical  mass  balance 

By  including  the  local  dust  samples  among  the 
potential  source  profiles  in  the  second  CMB  anal¬ 
ysis,  a  marked  improvement  in  the  quality  of  the 
predicted  results  was  achieved  The  reanalysis  did 
not  change  the  types  of  sources  identified  by  the 
CMB  analysis.  However,  the  apportionment  be¬ 
tween  sources  varied  enough  to  cause  the  relative 
importance  of  sources  to  change.  The  improve¬ 
ment  in  the  results  can  be  seen  in  Table  1  where 
the  average  value  of  the  adjusted  R 2  increased  for 
both  fractions.  (The  adjustment  in  the  R2  values 
was  made  to  account  for  the  number  of  different 
sources  that  were  identified  for  each  sample.)  This 


increase  was  especially  apparent  for  the  coarse 
fraction  where  the  average  negative  change  was 
less  than  one  quarter  of  1%  while  the  average 
positive  change  was  above  7%.  Fig.  5  shows  that 
the  predicted  mass  of  the  fine  fraction  became 
closer  to  the  observed  mass  with  only  a  slight 
increase  in  error.  (The  error  m  the  initial  predicted 
results  was  influenced  by  the  use  of  an  artificial 
sulfur  component,  a  source  containing  only  sulfur, 
which  caused  the  initial  eror  to  be  fairly  low.)  Fig. 
6  shows  that  the  predicted  mass  of  the  coarse 
fraction  increased  while  the  associated  error  de¬ 
creased  Fig.  7  shows  that  the  predicted  mass  of 
the  fine  fraction  fitting  elements  changed  from  an 
average  over-prediction  to  an  average  under-pre- 
diction.  Similar  results  were  obtained  for  the  coarse 
fraction  samples  UiWer-prediction  is  the  more 
desirable  error  since  during  the  fitting  process,  it 
is  more  difficult  to  explain  mass  that  was  not 
observed  than  to  not  explain  all  of  the  mass  that 


mitiot  predicted  moss 
n  observed  moss 


Date 


fmol  predicted  moss 


Fig.  7  Fine  fraction  ambient  filter  mass  and  mass  of  CMB  fitting  elements,  March  1986-June  1987,  Granite  City,  1L 
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had  been  observed.  There  are  always  other  un¬ 
identified  sources  that  mjght  explain  the  un¬ 
accounted  for  mass. 

3.3.3  Total  carbon  results 

In  the  CCSEM  analyses,  carbon  was  found  to 
be  a  major  component  of  the  fine  fraction.  How¬ 
ever,  in  the  CMB  analyses,  carbon  was  never  fit 
well.  Lack  of  carbon  information  in  many  of  the 
source  profiles  compounded  the  problem  of  hav¬ 
ing  few  ambient  data. 

3.3.4  Conclusions 

By  measunng  ambient  filters  by  both  XRF  and 
INAA,  a  relatively  complete  set  of  elemental  mea¬ 
surements  was  obtained.  The  usefulness  of  these 
data  was  limited  by  the  current  unavailability  of 
source  profiles  including  these  elements.  The  lack 
of  data  on  carbon  was  a  special  problem  in  the 
present  study  since  the  limited  ambient  informa¬ 
tion  did  identify  carbon  as  being  an  important 
part  of  the  fine  mass.  The  inclusion  of  site-specific 
profiles  in  a  receptor-oriented  source  apportion¬ 
ment  program  improved  the  overall  quality  of  the 
source  apportionment  results  from  those  using 
only  literature  profiles.  While  not  identifying  new 
sources,  the  site-specific  profiles  significantly  im¬ 
proved  the  R 2  of  the  coarse  fretion.  It  also  de¬ 
crease  the  coarse  fraction’s  predicted  results  error 
values.  Considering  that  the  initial  fine  fraction 
CMB  required  a  unique  sulfur  factor  to  avhicve 
the  best  fit,  the  fine  fraction  results  arc  also  an 
indication  that  the  better  receptor  modeling  re¬ 
sults  arc  achieved  by  using  site-specific  profiles  for 
fugitive  emissions.  The  collection  and  analysis  of 
site-specific  fugitive  dust  profiles  should  be  col¬ 
lected,  if  possible,  during  the  course  of  future 
studies  employing  receptor  models. 

In  many  situations,  locally  measured  source 
profiles  are  not  available  or  there  may  have  been 
significant  changes  in  the  particle  producing  activ¬ 
ities  in  the  airshed  since  the  profiles  were  mea¬ 
sured.  Tims,  it  is  helpful  to  have  methods  that  can 
extract  infotmation  from  the  ambient  data  alone 
as  to  the  number,  nature,  and  mass  contributions 
of  the  particle  sources  in  an  area.  These  methods 
use  multivariate  statistical  methods  to  obtain  the 
receptor  modeling  information  required. 


4  MULTIVARIATE  RECEPTOR  MODELS 

4.1  Introduction 

Alternative  approaches  have  been  developed 
for  identifying  and  quantitatively  apportioning 
sources  of  airborne  particles  using  multivariate 
statistical  analysis.  Eigenvector  analysis  has  been 
the  principal  method  that  has  been  applied  to 
airborne  particle  composition  data.  An  eigenvec¬ 
tor  analysis  tries  to  simplify  the  description  of  a 
system  by  determining  the  minimum  number  of 
new  variables  necessary  to  reproduce  the  mea¬ 
sured  attributes  of  the  system.  The  mathematical 
basis  of  these  methods  has  been  described  by 
Hopke  (33). 

Principal  components  and  factor  analysis  are 
names  given  to  several  of  the  variety  of  forms  of 
eigenvector  analysis.  It  was  originally  developed 
and  used  in  psychology  to  provide  mathematical 
models  of  psychological  theories  of  human  ability 
and  behavior  (43).  However,  eigenvector  analysis 
has  found  wide  application  throughout  the  physi¬ 
cal  and  life  sciences.  Unfortunately,  a  great  deal 
of  confusion  exists  in  the  literature  in  regard  to 
the  terminology  of  eigenvector  analysis.  Various 
changes  in  the  way  the  method  is  applied  has 
resulted  in  it  being  called  factor  analysis,  principal 
components  analysis,  principal  components  factor 
analysis,  empirical  orthogonal  function  analysis, 
Karhunen-Locve  transform,  etc.,  depending  on 
the  way  the  data  are  scaled  before  analysis  or  how 
the  resulting  vectors  are  treated  after  the  eigenvec¬ 
tor  analysis  is  completed.  All  of  the  methods  have 
the  same  basic  objective;  the  compression  of  data 
into  fewer  dimensions  and  the  identification  of  the 
structure  of  interrelationships  that  exist  between 
the  variables  measured  or  the  cases  studied. 

4.2  Mathematical  procedures 

The  first  step  in  the  eigenvector  analysis  is  the 
calculation  of  a  dispersion  matrix,  the  matrix  that 
contains  quantitative  information  on  the  relative 
vanauon  of  pairs  of  variables  or  pairs  of  samples 
(cases).  There  are  two  basic  types  of  dispersion 
matrices.  They  are  covariance  matrices  and  corre- 
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lation  matrices.  For  a  correlation  matrix,  the  data 
are  scaled  such  that  each  variable  or  each  case  has 
an  equal  weight.  The  data  are  not  scaled  before 
calculating  covariance  In  both  instances,  the  data 
may  be  centered  by  subtracting  a  mean  value 
before  scaling  and  the  calculation  of  the  matrix 
elements.  The  choice  of  dispersion  matrix  depends 
on  the  nature  of  the  data  set  to  be  analyzed.  For 
many  types  of  chemical  spectroscopic  data,  the 
covariance  matrix  is  the  choise  because  each  varia¬ 
ble  has  the  same  measurement  scale.  For  many 
geochemical  problems,  the  difference  in  scale  be- 
tween  major,  minor,  and  trace  components  re¬ 
quires  scaling  to  avoid  domination  of  the  analysis 
by  the  major  components. 

The  dispersion  matrix  is  then  decomposed  into 
a  scries  of  orthogonal  vectors  by  the  process  out¬ 
lined  by  Joreskog  ct  at.  (44)  so  that 

U'DU  « A  W 

where  U  is  the  matrix  v,f  eigenvectors,  U'  is  its 
transpose,  D  is  the  dispersion  matrix,  and  A  is  a 
diagonal  matrix  of  eigenvalues  where  the  trace  of 
A  is  equal  to  the  trace  of  D.  If  there  were  no 
errors  in  the  data  from  which  D  is  calculated,  the 
number  of  non-zero  eigenvalues  would  be  the  di¬ 
mensionality  of  the  problem  celled  the  rank  of  D. 
The  rank  for  the  original  data  matrix  is  the  same 
as  that  for  the  dispersion  matrix.  However,  experi¬ 
mental  error  generally  results  in  a  number  of  small 
but  non-zero  eigenvalues*  The  determination  of 
the  number  of  vectors  containing  significant  infor¬ 
mation  relative  to  those  dominated  by  noise  is 
often  a  difficult  one.  The  lack  of  universally  appli¬ 
cable  criteria  for  determining  the  dimensionality 
of  the  data  is  a  major  problem  in  the  application 
of  factor  analysis. 

In  the  most  commonly  used  approach  to  calcu¬ 
lating  the  eigenvectors,  the  maximum  amount  of 
variance  is  packed  into  the  first  eigenvalue.  The 
maximum  possible  amount  of  the  remaining  vari¬ 
ance  goes  into  the  second  and  so  foith.  This 
compression  of  the  information  into  a  few  compo¬ 
nents  permits  much  of  the  variation  in  the  data  set 
to  be  displayed  in  a  two-  or  three-dimensional 
plot.  For  many  classification  problems,  ihe  first 
few  factors  are  able  to  reproduce  most  of  the  data 


structure  and  to  remove  some  of  the  noise.  The 
objects  can  then  be  plotted  using  the  components 
axes  and  thus  display  the  features  of  high-dimen¬ 
sional  data  in  a  few  dimensions  (451, 

The  compression  of  variance  into  the  first  fac¬ 
tors  will  improve  the  ease  with  which  the  number 
of  factors  can  be  determined.  However,  their  na¬ 
ture  has  now  been  mixed  by  the  calculational 
method.  Thus,  once  the  number  of  factors  has 
been  determined,  it  is  often  useful  to  rotate  the 
axes  in  order  to  provide  a  more  interpretable 
structure. 

The  axis  rotation  can  retain  the  orthogonality 
of  the  eigenvectors  or  cause  them  to  be  oblique. 
Depending  on  the  initial  data  treatment,  the  axes 
rotations  may  be  in  the  scaled  and/or  centered 
space  or  in  the  original  variable  scale  space.  The 
latter  approach  has  proved  quite  useful  in  a  num¬ 
ber  of  chemical  applications  described  by 
Malinowski  and  Howery  (46)  and  in  environmen¬ 
tal  systems  as  described  by  Hopke  (33). 


4.3  Previous  applications 

The  first  modeling  applications  of  classical  fac¬ 
tor  analysis  were  by  Prinz  and  Slratmann  (47)  and 
Dlifford  and  Meeker  (48).  Prinz  and  Slratmann 
(47)  examined  both  the  aromatic  hydrocarbon 
content  of  the  air  in  12  West  German  cities  and 
data  from  Colucci  and  Bcgcman  (49)  on  the  atr 
quality  of  Detroit.  In  both  cases  they  found  three 
factor  solutions  and  used  an  orthogonal  varimax 
rotation  to  give  more  readily  interpretable  results. 
Blifford  and  Meeker  (48)  used  a  principal  compo¬ 
nent  analysis  with  both  varimax  and  a  non-or- 
thogonal  rotation  to  examine  particle  composition 
data  collected  by  the  National  Air  Sampling  Net¬ 
work  (NASN)  during  1957-1961  in  30  U.S,  cities. 
They  were  generally  not  able  to  extract  much 
interpretable  information  from  their  data.  Since 
there  are  a  very  wide  variety  of  particle  sources 
among  these  30  cities  and  only  13  elements  were 
measured,  it  is  not  surprising  that  they  were  not 
able  to  provide  much  specificity  to  their  factors. 

The  factor  analysis  approach  was  then  reintro¬ 
duced  by  Hopke  et  al.  (50)  and  Gaarenstroom  el 
al.  (51)  for  their  analysis  of  particle  composition 
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data  from  Boston,  MA  and  Tucson,  AZ,  respec¬ 
tively  In  the  Boston  data  for  90  samples  at  a 
variety  of  sites,  six  common  factors  were  identi¬ 
fied  that  were  interpreted  as  soil,  sea  salt,  oil-fired 
power  plants,  motor  vehicles,  refuse  incineration, 
and  an  unknown  manganese-selenium  source.  The 
six  factors  accounted  for  about  78%  of  the  system 
variance.  There  was  also  a  high  unique  factor  for 
bromine  that  was  interpreted  to  be  fresh  automo¬ 
bile  exhaust.  Since  lead  was  not  determined,  these 
motor  vehicle-related  factor  loading  assignments 
remain  uncertain.  Large  unique  factors  for  anti¬ 
mony  and  selenium  were  found.  These  factors 
represent  emissions  of  species  whose  concentra¬ 
tions  do  not  covary  with  other  elements.  Subse¬ 
quent  studies  by  Thurston  and  Spengler  (52)  where 
other  elements  including  sulfur  and  lead  were 
measured  showed  a  similar  result.  They  found  that 
the  selenium  was  strongly  correlated  with  sulfur 
for  the  warm  season  (May  6  to  November  5).  This 
result  is  in  agreement  with  the  Whitefacc  Moun¬ 
tain  results  (53)  and  suggests  that  selenium  is  an 
indicator  of  long  range  transport  of  coal-fircd 
power  plant  effluents  to  the  northeastern  U.S. 
They  found  lead  to  be  strongly  correlated  with 
bromine  and  readily  interpreted  as  motor  vehicle 
emissions. 

In  the  study  of  Tucson,  AZ  (51),  whole  filter 
data  were  analyzed  separately  at  each  site.  They 
find  factors  that  are  identified  as  soil,  automotive, 
several  secondary  aerosols  such  as  (NH4)2S04, 
and  several  unknown  factors.  They  also  dis¬ 
covered  a  factor  that  represented  the  variation  of 
elemental  composition  in  their  aliquots  of  their 
neutron  activation  standard  containing  Na,  C,  K, 
Fc,  Zn,  and  Mg.  This  finding  illustrates  one  of  the 
important  uses  of  factor  analysis,  screening  the 
data  for  noisy  variables  or  analytical  artifacts. 

One  of  the  valuable  uses  of  this  type  of  analysis 
is  in  screening  large  data  sets  to  identify  errors 
(54).  With  the  use  of  atomic  and  nuclear  methods 
to  analyze  environmental  samples  for  a  multitude 
of  elements,  very  large  data  sets  have  been  gener¬ 
ated.  Because  of  the  ease  in  obtaining  these  results 
with  computerized  systems,  the  elemental  data 
acquired  are  not  always  as  thoroughly  checked  as 
they  should  be,  leading  to  some,  if  not  many,  bad 
data  points.  It  is  advantageous  to  have  an  efficient 


and  effective  method  to  identify  problems  with  a 
data  set  before  it  is  used  for  further  studies.  Prin¬ 
cipal  component  factor  analysis  can  provide  useful 
insight  into  several  possible  problems  that  may 
exist  in  a  data  set  including  incorrect  single  values 
and  some  types  of  systematic  errors. 

Gatz  (55)  used  a  principal  components  analysis 
of  aerosol  composition  and  meterorological  data 
for  St.  Louis,  MO  taken  as  part  of  project 
METROMEX  (56,57).  Nearly  400  filters  collected 
at  12  sites  were  analyzed  for  up  to  20  elements  by 
ion-induced  XRF.  Gatz  (55)  used  additional 
parameters  in  his  analysis  including  day  of  the 
week,  mean  wind  speed,  percent  of  time  with  the 
wind  from  NE,  SE,  SW,  or  NW  quadrants  or 
variable,  ventilation  rate,  rain  amount  and  dura¬ 
tion.  At  several  sites  the  inclusion  of  wind  data 
permitted  the  extraction  of  additional  factors  that 
allowed  identification  of  motor  vehicle  emissions 
in  the  presence  of  specific  point  sources  of  lead 
such  as  a  secondary  copper  smelter.  An  important 
advantage  of  this  form  of  factor  analysis  is  the 
ability  to  include  parameters  such  as  wind  speed 
and  direction  or  particle  size  in  the  analysis. 

In  the  early  applications  of  factor  analysis  to 
particulate  compositional  data,  it  was  generally 
easy  to  identify  a  fine  particle  mode  lead-bromine 
factor  that  could  be  assigned  as  motor  vehicle 
emissions.  In  many  cases,  a  calcium  factor  some¬ 
times  associated  with  lead  could  be  found  in  the 
coarse  mode  analysis  and  could  be  assigned  as 
road  dust.  However,  the  problem  of  diminishing 
lead  concentrations  in  gasoline  discussed  earlier 
for  the  CMB  analysis  also  applies  here.  As  the 
lead  and  related  bromine  concentrations  diminish, 
the  clearly  distinguishable  covariance  of  these  two 
elements  is  disappearing.  In  a  study  of  particle 
sources  m  southeast  Chicago,  1L  based  on  samples 
from  1985  and  1986,  much  lower  lead  levels  are 
observed  and  the  lead- bromine  correlation  is  quite 
weak  (23).  Thus,  the  identification  of  highway 
emissions  through  factor  analysis  based  on  lead  or 
lead  and  bromine  is  becoming  more  and  more 
difficult  and  other  analytic  species  are  going  to  be 
needed  in  the  future, 

A  problem  that  exists  with  these  forms  of  fac¬ 
tor  analysis  is  that  they  do  not  permit  quantitative 
source  appointment  of  particle  mass  or  of  specific 
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elemental  concentrations.  In  an  effort  to  find  an 
alternative  method  that  would  provide  informa¬ 
tion  on  source  contributions  when  only  the  am¬ 
bient  particulate  analytical  results  are  available, 
Hopke  and  co-workers  (58-64]  have  developed 
target  transformation  factor  analysis  (TTFA)  in 
which  uncentered  but  standardized  data  are 
analyzed.  In  this  analysis,  resolution  similar  to 
that  obtained  from  a  CMB  analysis  can  be  ob¬ 
tained.  However,  a  CMB  analysis  can  be  made  on 
a  single  sample  if  the  source  data  are  known  while 
TTFA  requires  a  series  of  samples  with  varying 
impacts  by  the  same  sources,  but  does  not  require 
a  prion  knowledge  of  the  source  characteristics. 
The  objectives  of  TTFA  are  (1)  to  determine  the 
number  of  independent  sources  that  contnbute  to 
the  system,  (2)  to  identify  the  elemental  source 
profiles,  and  (3)  to  calculate  the  contnbution  of 
each  source  to  each  sample. 

One  of  the  first  applications  of  TTFA  was  to 
the  source  identification  of  urban  street  dust  (59], 
A  sample  of  street  dust  was  physically  fractionated 
by  particle  size,  density,  and  magnetic  susceptibil¬ 
ity  to  produce  30  subsamples.  Each  subsample 
was  analyzed  by  instrumental  neutron  activation 
analysis  and  atomic  absorption  spectroscopy  to 
yield  analytical  results  for  35  elements.  The  num¬ 
ber  of  sources  is  determined  by  performing  an 
eigenvalue  analysis  on  the  matrix  of  correlations 
between  the  samples.  A  target  transformation  de¬ 
termines  the  degree  of  overlap  between  an  input 
source  profile  and  one  of  the  calculated  factor 
axes.  The  input  source  profiles,  called  test  vectors, 
are  developed  from  existing  knowledge  of  the 
emission  profiles  of  vanous  sources  or  by  an  itera¬ 
tive  technique  from  simple  test  vectors  (63].  The 
identified  source  profiles  are  then  used  in  a  simple 
weighted  least-squares  determination  of  the  mass 
contributions  of  the  sources  [62], 

In  the  analysis  of  the  street  dust,  six  sources 
were  identified  including  soil,  cement,  tire  wear, 
direct  automobile  exhaust,  salt  and  iron  particles. 
The  lead  concentration  of  the  motor  vehicle  source 
was  found  to  be  15%  with  a  lead-to-bromine  ratio 
of  0.39.  This  ratio  is  in  good  agreement  with  the 
values  obtained  by  Dzubay  et  al.  [65]  for  Los 
Angeles,  CA  freeways  and  in  the  range  presented 
by  Harrison  and  Sturgcs  (66]  in  their  extensive 


review  of  the  literature.  A  comparison  of  the  ac¬ 
tual  mass  fractions  with  those  calculated  from  the 
TTFA  results  shows  that  the  TTFA  provided  a 
good  reproduction  of  the  mass  distribution  and 
source  apportionments  of  the  street  dust  that  sug¬ 
gest  that  a  substantial  fraction  of  the  urban  road¬ 
way  dust  is  anthropogenic  in  origin 

One  of  the  principal  advantages  of  TTFA  is 
that  it  can  identify  the  source  composition  profiles 
as  they  exist  at  the  receptor  site.  There  can  be 
changes  in  the  composition  of  the  particles  in 
transit  from  the  source  to  the  receptor  and  ap¬ 
proaches  that  provide  theie  modified  source  pro¬ 
files  should  improve  the  receptor  model  results. 
Chang  et  al.  (63]  have  applied  TTFA  to  an  exten¬ 
sive  set  of  data  from  St.  Louis,  MO  to  develop 
source  composition  profiles  based  on  a  subset 
selection  process  developed  by  Rhemgrover  and 
Gordon  (67).  They  select  samples  from  a  data  base 
that  were  heavily  influenced  by  major  sources  of 
each  element.  These  samples  were  identified 
according  to  the  following  criteria: 

1,  Concentration  of  the  element  in  question  X  > 
X  +  Zct  where  .V  is  the  average  concentration 
of  that  particular  element  for  each  station  and 
size  fraction  (coarse  or  fine  particle  size  frac¬ 
tion),  ZCf  is  typically  set  at  about  three  for 
most  elements,  and  is  the  standard  deviation  of 
the  concentration  of  that  clement. 

2.  The  standard  deviation  of  the  6  or  12  h  average 
wind  directions  for  most  samples,  or  minute 
averages  for  2  h  samples,  taken  during  intensive 
periods  is  less  than  20°. 

Samples  that  are  strongly  affected  by  emissions 
from  a  source  were  identified  through  observation 
of  clustering  of  mean  wind  directions  for  the  sam¬ 
pling  periods  selected  with  angles  pointing  toward 
the  source. 

A  number  of  studies  of  multivariate  receptor 
models  have  used  the  data  base  from  the  Regional 
Air  Pollution  Study  (RAPS)  of  St.  Louis,  MO.  In 
the  RAPS  program,  automated  dichotomous  sam¬ 
plers  were  operated  over  a  2  year  period  at  10  sites 
in  the  St.  Louis  metropolitan  area.  Fig.  2  shows 
the  location  of  the  10  RAPS  sampling  stations. 
Ambient  aerosol  samples  were  collected  in  fine, 
<  2.4  pm,  and  coarse,  2.4-20  pm,  fractions.  Sam¬ 
ples  were  analyzed  at  the  Lawrence  Berkeley 
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Laboratory  for  total  mass  by  beta-gauge  measure¬ 
ments  and  for  27  elements  by  XRF.  The  RAPS 
database  contains  results  for  almost  35000  sam¬ 
ples. 

Rheingrover  and  Gordon  (67)  screened  the 
RAPS  database  according  to  the  criteria  stated 
above.  With  wind  trajectory  analysis,  specific 
emission  sources  could  be  identified  even  in  cases 
where  the  sources  were  located  very  close  together 
[67]  A  compilation  of  the  selected  impacted  sam¬ 
ples  was  made  so  that  TTFA  could  be  employed 
to  obtain  elemental  profiles  for  these  sources  at 
the  various  receptor  sites. 

Thus,  TTFA  may  be  very  useful  in  determining 
the  concentration  of  lead  in  motor  vehicle  emis¬ 
sion  as  the  mix  of  leaded  fuel  continues  to  change. 
Multivariate  methods  can  thus  provide  consider¬ 
able  information  regarding  the  sources  of  particles 
including  highway  emissions  from  only  the  am¬ 
bient  data  matrix.  The  TTFA  method  represents  a 
useful  approach  when  source  information  for  the 
area  is  lacking  or  suspect  and  if  there  is  uncer¬ 
tainty  as  to  the  identification  of  all  of  the  sources 
contnbuting  to  the  measured  concentrations  at  the 
receptor  site. 

Further  efforts  have  recently  been  made  by 
Henry  and  Kim  [68]  on  extending  eigenvector 
analysis  methods.  They  have  been  examining  ways 
to  incorporate  the  explicit  physical  constraints  that 
a,e  inherent  in  the  mixture  resolution  problem 
into  the  analysis.  Through  the  use  of  linear  pro¬ 
gramming  methods,  they  are  better  able  to  define 
the  feasible  region  in  which  the  solution  must  lie. 
There  exists  a  limited  region  in  the  solution  space 
because  the  elements  of  the  source  profiles  must 
all  be  greater  than  or  equal  to  zero  (non-negative 
source  profiles)  and  the  mass  contributions  of  the 
identified  sources  must  also  be  greater  than  or 
equal  to  zero.  Although  there  has  only  been  limited 
applications  of  this  expanded  method,  it  offers  an 
important  additional  tool  to  apply  to  those  sys¬ 
tems  where  a  priori  source  profile  data  are  not 
available,  peso  methods  provide  a  useful  parallel 
analysis  with  CMB  to  help  insure  that  the  profiles 
used  are  reasonable  representations  of  the  sources 
contributing  to  a  given  set  of  samples. 


4.4  Illustrative  example 


4.4.1  Data  description 

In  order  to  demonstrate  the  use  of  TTFA  for 
the  resolution  of  sources  of  urban  aerosols,  TTFA 
will  be  applied  to  a  compositional  data  set  ob¬ 
tained  from  aerosol  samples  collected  during  the 
RAPS  program  in  St.  Louis,  MO  [60],  The  data 
for  the  samples  collected  during  July  and  August 
1976  from  station  112  were  selected  for  the  TTFA 
process.  Station  112  was  located  near  Francis 
Field,  the  football  stadium  on  the  campus  of 
Washington  University,  west  of  downtown  St. 
Louis,  MO. 

During  the  62  days  of  July  and  August,  filters 
were  changed  at  12  h  intervals,  producing  a  total 
of  124  samples  in  each  the  fine  and  coarse  frac¬ 
tions.  Data  were  missing  for  24  pairs  of  samples 
leaving  a  total  of  100  pairs  of  coarse  and  fine 
fraction  samples.  Of  the  27  elements  determined 
for  each  sample,  a  majority  of  the  determinations 
of  10  elements  had  values  below  the  detection 
limits.  Since  a  complete  and  accurate  data  set  is 
required  to  perform  a  factor  analysis,  these  10 
elements  were  eliminated  from  the  analysis.  For 
example,  arsenic  was  excluded  because  almost  all 
of  the  values  were  below  the  detection  limits. 
Arsenic  determinations  by  XRF  are  often  unrelia¬ 
ble  because  of  an  interference  between  the  arsenic 
K  X-ray  and  the  lead  L  X-ray.  A  neutron  activa¬ 
tion  analysis  of  these  samples  would  produce  be¬ 
tter  arsenic  determinations.  Reliable  data  for  ar¬ 
senic  may  be  important  to  the  differentiation  of 
coal  flyash  and  crustal  material;  two  materials 
with  very  similar  source  profiles.  The  low  per¬ 
centage  of  measured  elements  can  lead  to  distor¬ 
tions  in  the  scaling  factors  produced  by  the  multi¬ 
ple  regression  analysis.  The  remaining  mass  con¬ 
sists  primarily  of  hydrogen,  oxygen,  nitrogen,  and 
carbon.  Although  no  measurements  of  carbon  are 
included  in  the  RAPS  data,  that  portion  of  the 
sample  mass  must  still  be  accounted  for  by  the 
resolved  sources.  In  order  to  produce  the  best 
possible  source  resolutions,  it  is  vital  to  have  accu¬ 
rate  measurements  of  the  mass  of  total  suspended 
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particles  (TSPs)  as  well  as  determinations  for  as 
many  elements  as  possible. 

The  fine  and  the  coarse  samples  were  analyzed 
separately  and  only  the  fme-fraction  results  will 
be  reported  here.  In  this  target  transformation 
analysis,  a  set  of  potential  source  profiles  was 
assembled  from  the  literature  to  use  as  initial  test 
vectors.  In  addition  the  set  of  unique  vectors  was 
also  tested 

4,4.2  Results 

The  eigenvector  analysis  provided  the  results 
presented  in  Table  2  Examination  of  the  eigen¬ 
vectors  suggests  the  presence  of  4  major  sources, 
possibly  2  weak  sources,  and  noise.  To  begin  the 
analysis,  a  4-vector  solution  was  obtained.  The 
iteratively  refined  source  profiles  are  given  in  Ta¬ 
ble  3.  The  first  3  vectors  can  be  easily  identified  as 
motor  vehicles  (Pb,  Br\  regional  sulfate,  and 
soil/flyash  (Si,  Al)  based  on  their  apparent  ele¬ 
mental  composition. 

However,  the  fourth  vector  showed  high  K,  Zn, 
Ba,  and  Sr  was  not  initially  obvious  as  to  its 
origin.  The  resulting  mass  loadings  were  then 
calculated  and  the  only  significant  values  were  for 
the  sampling  periods  of  noon  to  midnight  on  July 
4  and  midnight  to  noon  on  July  5.  Tins  was  July  4, 
1976  and  there  was  a  bicentennial  fireworks  dis¬ 
play  at  this  location.  Thus,  these  two  highly  in¬ 
fluenced  samples  change  the  whole  analysis. 

To  illustrate  this  further,  Tabic  4  gives  the 
average  values  of  the  elemental  composition  of  the 
fine  fraction  samples  for  the  samples  with  and 

TABLE  2 


Results  of  eigenvector  analysis  of  July  and  August  1976  fine 
fraction  data  at  Site  112  in  St.  Louis.  MO 


Factor 

Eigenvalue 

x2 

Exner 

Average 
%  error 

1 

90. 

210 

0324 

204 

2 

50 

156 

0214 

164 

3 

1.7 

65 

0.141 

129 

4 

1.3 

63 

0064 

93 

5 

0.16 

55 

0047 

72 

6 

009 

26 

0034 

68 

7 

003 

24 

0027 

67 

S 

002 

24 

0021 

58 

9 

002 

15 

0016 

49 

TABLE  3 


Refined  source  profiles  for  the  4  source  solution  al  RAPS  Site 
112,  July-  August  1976 


Element 

Motor 

vehicle 

Sulfate 

Flyash/ 

soil 

Fireworks 

Al 

3. 

0.9 

62. 

60 

Si 

00 

28 

140 

00 

S 

00 

232. 

14. 

26 

Cl 

5.2 

16 

0  31 

19. 

K 

0.0 

006 

43. 

580 

Ca 

12. 

0006 

17. 

027 

Ti 

28 

18 

23 

00 

Mn 

13 

0.1 

08 

36 

Fe 

58 

38 

38. 

9. 

N« 

02 

006 

005 

03 

Cu 

1.9 

02 

003 

46 

Zn 

98 

1.4 

00 

24. 

Se 

01 

01 

00 

001 

Br 

26. 

00 

2.7 

2. 

Sr 

00 

00 

09 

12 

Ba 

1.45 

0.3 

08 

15. 

Pb 

105. 

8. 

38 

00 

without  the  July  4  and  5  samples  included  It  can 
be  seen  that  these  two  samples  from  July  4  and  5 
from  100  sample  set  have  changed  the  average 
value  of  K  by  a  factor  of  2  and  the  average  Sr  by  a 

TABLE 4 

Comparison  of  data  with  and  without  samples  from  July  4  and 
5.  RAPS  Station  112.  July  and  August  1976  fine  fraction 

Element  Mean  i  S  D.  (ng/m1 ) 

with  Without 


Al 

220 

± 

30 

200  ±  30 

S» 

440 

± 

60 

450  ±  60 

S 

4370 

±310 

4360  ±320 

Cl 

90 

± 

10 

80  ±  9 

K 

320 

±130 

150  ±  9 

Ca 

110 

± 

10 

110  ±  10 

Ti 

63 

± 

13 

64  ±  13 

Mn 

17 

± 

3 

17  ±  3 

Fc 

220 

±  20 

220  ±  20 

Si 

2.3 

± 

0.2 

2.3  ±  02 

Cu 

16 

± 

3 

15  ±  3 

Zn 

78 

± 

8 

75  ±  8 

Se 

2.7  ± 

02 

2.7  ±  02 

Br 

140 

± 

9 

130  ±  8 

Sr 

5 

± 

4 

1.1±  0.1 

Ba 

19 

± 

5 

15  *  4 

Pb 

730 

± 

50 

720  ±  50 

40 
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TABLE  5 


Results  of  eigenvector  analysis  of  July  and  August  1976  fine 
fraction  data  at  Site  1 12  in  St  Louis,  MO  excluding  July  4  and 
5  data 


Factor 

Eigenvalue 

x! 

Exner 

Average 
%  error 

1 

87, 

210 

0.304 

197 

2 

49 

152 

0.304 

197 

3 

2.0 

57 

0070 

123 

4 

02 

42 

0050 

98 

5 

0,1 

26 

0037 

73 

6 

0.1 

25 

0029 

69 

7 

002 

26 

0023 

69 

S 

002 

17 

0019 

67 

9 

0.01 

16 

0015 

53 

factor  of  5.  Thus,  TTFA  can  find  strong,  unusual 
events  in  a  large  complex  data  set  After  dropping 
the  samples  from  July  4  and  5,  the  analysis  was 
repeated  and  the  results  arc  presented  in  Table  5 
Now  there  are  3  strong  factors,  2  weaker  ones,  and 
a  continuum.  Thus,  a  5-factor  solution  was  sought. 
These  results  are  presented  in  Table  6. 

The  target  transformation  analysis  for  the  fine 
fraction  without  July  4  and  5  indicated  the  pres- 


TABLE6 


Refined  source  profiles  (mg/g),  RAPS  Station  112.  July  and 
August  1976,  fine  fraction  without  July  4  and  5 


Element 

Vehicle 

Motor 

sulfate 

Soil/ 

flyash 

Pamt 

Refuse 

A1 

5. 

1.1 

53, 

00 

00 

Sj 

00 

1.9 

130. 

00 

7. 

s 

02 

240. 

19. 

6. 

00 

Cl 

2.4 

1.1 

00 

46 

22. 

K 

1.4 

1.6 

15. 

5.7 

48. 

Ca 

n. 

00 

16. 

34. 

1.2 

T» 

00 

0.7 

2,5 

no. 

00 

Mn 

00 

00 

0.7 

4.S 

86 

Fe 

00 

M 

36. 

90. 

36. 

Ni 

oos 

004 

0042 

0011 

0.7 

Cu 

06 

001 

00 

00 

8.7 

Zn 

08 

00 

00 

3.7 

65, 

Se 

0.1 

0.1 

0001 

02 

02 

Br 

30. 

03 

2.5 

00 

005 

Sr 

009 

001 

0.15 

01 

0001 

Ba 

0.7 

0035 

007 

28. 

05 

Pb 

107. 

6,5 

5. 

00 

46. 

ence  of  a  motor  vehicle  source,  a  sulfate  source,  a 
soil  or  flyash  source,  a  paint-pigment  source,  and 
a  refuse  source.  The  presence  of  the  sulfate, 
paint-pigment,  and  refuse  factors  was  determined 
by  the  uniqueness  test  for  the  elements  sulfur, 
titanium,  and  zinc,  respectively.  In  the  paint-pig¬ 
ment  factor,  titanium  was  found  to  be  associated 
with  the  elements  sulfur,  calcium,  iron,  and 
barium.  This  plant  used  iron  titanate  as  its  input 
material  and  the  profile  obtained  in  this  analysis 
appears  to  be  realistic.  The  zinc  factor,  associated 
with  the  elements  chlorine,  potassium,  iron,  and 
lead,  is  attributed  to  refuse-incinerator  emissions. 
However,  a  high  chlorine  concentration  is  usually 
associated  with  particles  from  refuse  incinerators 
(69,70).  This  factor  might  also  represent  particles 
from  zinc  and/or  lead  smelters. 

The  results  of  this  analysis  provide  quite  rea¬ 
sonable  fits  to  the  elemental  concentration  and  to 
the  fine  mass  concentrations  for  this  system.  Thus, 
the  TTFA  provided  a  resolution  of  source  types 
and  concentrations  that  appear  plausible  although 
specific  sources  are  not  identified  and  quantita¬ 
tively  apportioned.  From  other  studies  with  other 
data  sets,  it  appears  TTFA  is  typically  able  to 
identify  5  to  7  source  types  as  long  as  they  are 
reasonably  distinct  from  one  another. 


5  SUMMARY 

in  this  paper,  several  of  the  active  areas  of 
receptor  modeling  have  been  introduced.  Their 
ability  to  determine  the  sources  of  particles  in  the 
air  can  be  very  useful  in  developing  air  quality 
management  strategies  and  can  potentially  be¬ 
come  enforcement  tools  as  well.  Since  receptor 
models  must  of  necessity  be  retrospective  in  na¬ 
ture,  another  important  use  can  be  in  the  calibra¬ 
tion  and  testing  of  the  prognostic  dispersion  mod¬ 
els  so  that  prediction  of  changes  in  air  quality  can 
serve  as  a  more  reliable  basis  for  management 
decisions.  The  field  of  receptor  modeling  has 
grown  and  developed  rapidly  during  the  last 
several  years  and  can  be  expected  to  continue  to 
do  so  as  methods  arc  improved  and  new  applica¬ 
tions  discovered. 
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Abstract 


Gleser,  LJ ,  1991.  Measurement  error  models.  Chemometnes  and  Intelligent  Laboratory  Systems,  10.  45-57, 

An  overview  is  given  of  linear  measurement  error  models.  Such  models  appear  in  many  forms,  including  errors-in-vanables 
regression  and  factor  analysis,  but  are  mathematically  related  to  each  other.  Of  particular  interest  to  chemists  are  mass  balance 
receptor  models  in  which  source  profiles  are  estimated  with  error.  A  general  model  is  given  for  errors  in  profiles,  and  the  attention  of 
chemists  is  directed  toward  recent  advances  in  statistical  model  fitting  and  numerical  analysis  which  may  be  of  use  in  estimating 
source  contnbutions. 


1  INTRODUCTION 

Measurement  error  models  have  been  applied 
in  virtually  every  area  of  science  and  technology. 
Perhaps  most  familiar  to  chemists  are  the  models 
of  factor  analysis  and  errors-m-variables  regres¬ 
sion  models,  in  which  the  predictors  (independent 
variables)  are  observed  subject  to  random  errors 
of  measurement. 

Although  measurement  error  models  can  be 
either  linear  or  nonlinear,  in  the  present  paper 
attention  is  confined  to  linear  measurement  error 
models.  Section  2  introduces  such  models,  indicat¬ 
ing  the  wide  variety  of  mathematical  forms  in 
which  these  models  can  be  stated.  Some  basic 
concepts,  principles  and  terminology  are  intro¬ 
duced,  with  the  goal  of  facilitating  access  by 
chemists  to  the  broad  statistical  literature  dealing 
with  methods  for  fitting  and  analyzing  measure¬ 
ment  error  models. 

A  brief  survey  of  available  statistical  estimation 


methods,  and  related  computer  software,  is  given 
in  Section  3.  Particularly  emphasized  is  an  ap¬ 
proach,  called  ‘correction  for  attenuation’  by  psy- 
chometricians,  which  adjusts  classical  regression 
estimators  (which  ignore  measurement  errors  in 
the  predictor  variables)  for  errors  in  the  predic¬ 
tors.  Besides  permitting  use  of  standard  al¬ 
gorithms  (both  classical  and  more  recent  robust 
methods),  this  approach  also  has  the  merit  of 
focusing  the  attention  of  users  on  ways  to  obtain 
and  use  available  information  about  the  sources 
and  magnitudes  of  the  measurement  errors. 

In  environmental  studies,  chemists  have  used 
both  factor  analysis  and  errors-in-vanables  regres¬ 
sion  (which  they  call  effective  variance  calcula¬ 
tion)  to  identity  source  contnbutions  to  environ¬ 
mental  pollution  (1),  The  statistical  models  used  in 
these  contexts  stem  from  linear  mass  balance 
equations  that  relate  the  concentrations  of  certain 
‘aerosol  properties’  (c.g.,  chemical  compounds)  at 
a  receptor  to  the  total  mass  contributions  from  the 
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sources.  These  applications  will  be  used  throughout 
the  paper  as  concrete  examples  of  linear  measure¬ 
ment  error  models.  In  Section  4,  some  suggestions 
for  possible  improvements  in  the  models -and 
methods  of  statistical  analysis  used  in  this  area 
will  be  presented. 


2  LINEAR  MEASUREMENT  ERROR  MODELS 

Measurement  error  models  have  in  common 
their  attempt  to  desenbe  situations  in  which  the 
variables  Y  observed  (denoted  by  capital  letters) 
are  of  interest  only  because  they  reflect  certain 
unobservable,  or  latent,  variables  y  (denoted  by 
corresponding  lower  case  letters)  that  are  mea¬ 
sured  by  Y  subject  to  random  error.  That  is, 

Y~y  +  e 

where  e  is  a  random  error  of  measurement  having 
mean  0  and  distribution  functionally  unrelated  to 
the  value  of  y.  For  the  / th  experimental  unit  or 
time  period,  we  may  have  obtained  measurements 
y/l\ . . . ,  V/m)  on  m  latent  variables  y}l\...,  y}m>, 

. fl.  Let  y<«(y/,>....,y/w))/  and 

(>’/,\...,y/'n>)'  be  m-dimcnsional  column  vectors 
containing  the  observed  and  latent  variables,  re¬ 
spectively,  Then 

+  (1) 

where  the  vectors  e,  of  measurement  errors  have 
mean  vector  0  and  distributions  functionally  unre¬ 
lated  to  the  values  of  the  latent  variables  yr  It  is 
usually  assumed  that  the  error  vectors  c,  are  inde¬ 
pendently  distributed. 

In  a  linear  measurement  error  model,  the  ele¬ 
ments  of  each  latent  vector  jj  are  assumed  to 
satisfy  a  common  set  of  linear  relationships.  Geo¬ 
metrically,  this  means  that  the  ys,  represented  as 
points  in  m-dimensional  space,  all  lie  in  a  hyper¬ 
plane  of  dimension  ry  r  <  m,  passing  through 
an  origin  a.  The  dimension  r  of  can  be  either 
known  or  unknown;  in  the  latter  case,  r  is  a  basic 
parameter  of  the  model. 

Three  commonly  used  ways  to  restate  the  above 


geometric  description  of  the  model  in  an  algebraic 
(parameterized)  form  are  the  following 

j^A/y  +  a,  (2) 

>v“fe)“(iBK+(«)’  (3) 

(4) 

In  eqn.  (3),  I,  is  the  r-dimensional  identity  matrix. 

The  model  (2)  is  the  familiar  model  of  factor 
analysis. The  columns  of  the  mXr  factor  loading 
matrix  A  are  a  basis  for  the  hyperplane  while 

the  factor  score  vectors  f  contain  the  coefficients 
representing  each  yt  as  a  linear  combination  of 
the  basis  elements  (columns  of  A). 

The  model  (3)  is  the  model  of  errors-in-vari- 
ables  regression.  Here,  r  elements  of  each  y\  serve 
as  predictor  (independent)  variables  for  the  re¬ 
maining  m  —  r  variables.  By  renumbering  compo¬ 
nents,  we  can  allow  the  predictor  variables  chosen 
to  form  the  r-dimensional  subvector  yl2  containing 
the  last  r  elements  of  yt.  The  slope  matrix  B. 
(m-r)Xr  and  intercept  vector  a  are  basic 
parameters  of  the  model. 

Model  (4)  is  a  more  symmetric  way  of  writing  a 
set  of  linear  equations  relating  the  elements  of  yi% 
in  that  no  distinction  is  made  between  indepen¬ 
dent  and  dependent  variables  (as  was  done  m 
model  (3)).  This  model  is  often  referred  to  as  an 
implicit  linear  functional  relationship  model.  The 
coefficient  matrix  A,  which  is  (/«  —  r)Xm  and 
has  full  rank  m  —  r,  and  the  vector  y  are  basic 
parameters  of  the  model. 

Model  (4)  often  results  from  consideration  of 
families  of  simultaneous  stochasuc  equations.  In 

such  models,  observations  XJt>  arc 

made  at  each  of  T  time  points  t.  It  is  assumed 
that  these  observations  satisfy  a  set  of  linear  equa¬ 
tions 
J 

i  =  1 . 1  (5) 

where  %fuy  b/(  are  independent  random 

vectors  having  mean  vector  0  and  a  common  dis¬ 
tribution.  The  XJt  are  quantities  internal  (endoge¬ 
nous)  to  a  given  system  (in  econometrics,  an  eco¬ 
nomic  system),  while  the  /„  represent  random 
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influences  external  (exogenous)  to  the  system  that 
account  for  the  linear  combinations  on  the  left 
side  of  eq  (5)  not  being  exactly  equal  to  0.  If 
J<I,  and  the  matrix  A  has  full  rank  /, 

we  can  renumber  indices  so  that 

Ata(Al>A2),  A,:J  XJof  rank  J 
Z,«(XU,...,X„)',  W/«(Xy+u,...,X//)' 

and  then  (5)  becomes 
AjZ.  +  AjW ,«/, 
or 

Z,=  -AfA2W,  +  A  r'f, 

nnw,+/*,  (6) 

Using  classical  multivariate  linear  regression 
methods,  we  can  find  an  estimator  fl  of  n.  To 
estimate  the  original  matrix  A  of  coefficients,  it  is 
necessary  to  impose  restrictions.  This  is  usually 
done  by  identifying  certain  of  the  al}  as  being 
equal  to  0.  Such  restrictions  on  A2  imply  that 
certain  elements  of  A,II  =  A2  are  zero.  Since  A,  is 
unknown,  this  results  in  an  implicit  linear  func¬ 
tional  relationship  model  for  IT.  Here,  the  col¬ 
umns  of  ft  become  the  observed  Y,  and  the 
columns  of  IT  are  the  latent  vectors  yr  Thus,  the 
model  (4)  is  applied  to  estimated  regression  slope 
matrices  in  a  classical  regression  model.  It  is  in 
this  manner  that  measurement  error  models  often 
appear  in  the  econometrics  literature.  The  poten¬ 
tial  application  of  similar  stochastic  equation 
models  (5),  and  the  resulting  linear  measurement 
error  models  (4),  in  chemistry  and  other  physical 
sciences  should  be  apparent.  In  these  models  some 
of  the  X)t  variables  can  be  measurements  of  vari¬ 
ables  obtained  at  times  prior  to  /  (that  is,  lagged 
values),  in  which  case  (5)  has  the  form  of  an 
A  RIM  A  time  series  model.  A  thorough  discussion 
of  linear  simultaneous  stochastic  equation  models, 
and  the  related  linear  measurement  error  models, 
can  be  found  in  refs.  2-4. 

In  calibration  models,  estimated  regression 
slopes  can  again  serve  as  observed  variables,  with 
true  slopes  acting  as  latent  variables.  For  exam¬ 
ples,  suppose  that  we  fit  a  linear  model 

Z/«»«+i8Wi  +  e4> 


and  obtain  the  least  squares  estimators  a,  ft  of  the 
intercept  and  slope.  A  new  observation  Z  is  ob¬ 
tained,  and  we  wish  to  estimate  the  value  of  W 
that  led  to  Z.  Then 


z\ 

n  +  jSW 
a 

+ 

Jl 

.  V 

C). 

has  the  form  of  a  measurement  error  model  with 
Y, ( Z ,  a ,  /?)',  yt  *»  (a  +  /?IF,  a,  /?)'.  One  way 
to  represent  the  linear  restriction  is  m  the  form 

(4) 

(1,0,—  W),y,  “  a 

Calibrations,  and  thus  calibration  models,  are 
widely  used  in  the  physical  sciences  and  engineer¬ 
ing  (5,6].  Although  most  calibrations  involve 
estimation  of  a  single  predictor  VV  from  a  single 
dependent  variable  Z  (perhaps  on  many  occa¬ 
sions),  multivariate  calibration  models  are  also 
used  (7,8)  The  calibration  literature  tends  to  em¬ 
phasize  methods  based  on  classical  linear  (or  ap¬ 
proximately  linear)  multiple  regression  models,  so 
that  the  connection  to  measurement  error  models 
is  not  widely  known.  Consequently,  the  calibra¬ 
tion  and  measurement  error  model  literatures  have 
tended  to  develop  in  parallel. 

It  should  be  added  that  predictor  variables  in 
the  physical  sciences,  and  also  the  medical  and 
behavioral  sciences,  arc  often  measured  indirectly 
through  calibration.  This  is  a  source  of  measure¬ 
ment  error  in  regression  experiments  that  is  fre¬ 
quently  overlooked,  at  the  cost  of  a  possibly  sub¬ 
stantial  bias  in  conclusions  (9J.  On  the  other  hand, 
calibration  experiments  provide  a  useful  way  to 
assess  measurement  errors  in  predictors  (Section 

3). 

In  mass  balance  models,  two  distinct  applica¬ 
tions  of  linear  measurement  error  models  arise. 
First,  we  may  have  measurements  Cn  of  con¬ 
centrations  of  ‘aerosol  property*  i  at  a  receptor  at 
time  /  for  m  properties  (/»!,...,  m)  and  T  times 
(f - 1,...,7*).  The  true  concentrations  cit  are 
thought  to  result  from  the  mass  contributions  sJt 
of  r  sources,  as  represented  by  the  linear  mass 
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balance  model: 


f„=  EVji'  i  =  l,.  ...ib;  /•» 

i-i 

(7) 


Letting 

v,=(c„ . c„,y.  »-(<•, . 

/,  =  (*!, . *«)'.  A  =  ((fl(/)) 

we  have  the  factor  analysts  model 

+  E(e,)  =  0,  y,“A/„  (8) 

The  intercept  term  a  tn  model  (3)  does  not  appear 
here  since  it  is  usually  assumed  that  all  variables 
are  centered  at  their  sample  means  (The  variables 
are  also  usually  standardized  by  their  standard 
deviations  —  a  practice  about  which  we  will  have 
more  to  say  later.)  In  applications  of  this  model, 
the  coefficients  atJ  of  the  mass  balance  equations 
and  the  number  r  of  sources  are  usually  assumed 
to  be  unknown. 

A  second  application  of  linear  measurement 
error  models  to  mass  balance  problems  occurs 
when  we  know  the  number  r  or  sources,  and  also 
have  unbiased  measurements  (or  other  similar 
prior  information)  for  the  coefficients  atj  in  eq. 
(7).  Here,  only  one  measurement  in  time  is  usually 
taken,  so  that  the  ‘aerosol  properties’  are  treated 
as  experimental  units.  That  is,  it  is  assumed  that 
we  observe 


/M 


where 


(C, 

b1 

^/1 

( 

Oil 

*f 

A, 

a", 

E(*i/)“0,  y**l . r+ 1,  and 

/  /M 

«<"  EV;*B  :  •  '“1- 

B-(* . *.) 


(9) 


(10) 


This  model  has  the  errors-in-variables  regression 
form  (2)  with  the  slope  matrix  B  giving  the  mass 
contributions  s ,,  i «  of  the  sources.  Again, 

the  intercept  a  in  model  (2)  does  not  appear  since 
all  measured  variables  are  centered  at  their  sample 
means. 

2.1  Model  uniqueness 

Although  the  idea  of  linear  relationships  among 
the  latent  variables  is  intuitively  clear  (with  a 
concrete  geometric  interpretation),  each  of  the 
models  used  to  represent  or  parameterize  the  idea 
has  elements  of  arbitrariness.  First,  note  that  the 
parameterizations  in  two  of  the  models  ((2)  and 
(4))  that  we  have  described  are  not  uniquely  de¬ 
fined.  For  example,  in  the  factor  analysis  model 
(2),  we  can  replace  A  by  A*  =  AT  and  f  by 

~  T~*/  for  any  r-dimensional  invertible  ma¬ 
trix  T  without  changing  the  validity  of  the  model 

y,  **  A  ft  +  a  «  ATT**1/  +  a  «=■  A  *f*  +  a 

(Since  the  columns  of  A  are  a  basis  for  the  hyper¬ 
plane  II,  and  bases  of  vector  spaces  are  not  unique, 
this  fact  should  not  be  surprising )  In  the  litera¬ 
ture,  this  nonuniqueness  problem  is  called  factor 
indeterminacy.  One  can  impose  restrictions  on  A 
(and  possibly  other  parameters  of  the  model)  to 
remove  this  indeterminacy,  but  such  restrictions 
are  exterior  to  the  model  (and  data)  and  cannot  be 
tested.  Indeed,  it  is  common  for  one  set  of  restric¬ 
tions  to  be  imposed  for  computational  conveni¬ 
ence  (usually  to  reduce  the  estimation  problem  to 
a  type  of  principal  components  analysis),  and  then 
for  investigators  to  search  among  the  set  of  equiv¬ 
alent  parameterizations  of  the  fitted  model  for  one 
which  has  meaning  in  the  given  context.  (For 
example,  the  program  VARIMAX  searches  to  find 
permissible  loadings  in  A  with  maximum  vari 
ability  — -  either  \ is  near  0  or  very  large.)  The 
extra  searching  that  such  exploratory  factor  analy¬ 
sis  methods  do  among  equivalent  parameteriza¬ 
tions  of  the  model  (2)  in  the  attempt  to  find  a 
‘meaningful  solution’  is  not  accounted  for  by 
customary  indices  of  accuracy  (large-sample  vari¬ 
ances  and  covariances  of  the  estimators).  It  is 
entirely  possible  for  two  investigators  starting  with 
the  same  data  and  the  same  initial  solution  for  the 
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parameters  to  arrive  at  quite  different  ‘meaningful 
solutions’  (final  fitted  models).  In  confirmatory 
factor  analysis,  on  the  other  hand,  a  set  of  restric¬ 
tions  is  imposed  a’pnori  (usually  based  on  previ¬ 
ous  experience  with  the  variables  being  studied), 
regardless  of  computational  convenience,  and  then 
such  a  model  is  fitted,  and  also  tested  against 
other  less  restrictive  models  (particularly  models 
allowing  a  larger  number  of  factors). 

Similar  comments  about  indeterminacy  apply 
to  model  (4).  Here,  the  coefficient  matrix  A  and 
the  vector  y  can  be  replaced  by  A  A  and  Ay,  for 
any  (m  —  r)-d',  tnsional  invertible  matrix  A, 
without  affecting  the  validity  of  the  equation 
defining  the  model.  Again,  restrictions  needed  to 
identify  the  parameters  cannot  be  tested  by  the 
given  data. 

By  imposing  suitable  extra-model  restrictions, 
one  can  reduce  both  model  (2)  and  model  (4)  to 
the  errors-in-variables  regression  form  (3).  (This  is 
intuitively  clear  from  the  fact  that  all  three  models 
describe  the  same  geometric  assumption  that  the 
latent  vectors  y,  lie  in  the  hyperplane  jf.)  Verifi¬ 
cation  of  this  assertion  can  be  found  in  refs.  3  and 
10,  However,  even  model  (3)  requires  prior  sep¬ 
aration  of  the  elements  of  y,  into  a  vector  of 
predictor  (independent)  variables  yn  and  a  vector 
of  dependent  variables  In  the  factor  analysis 
model  (2),  this  also  means  that  the  factors  /  arc 
identified  with  certain  of  the  components  of  yr 
Where  there  is  a  natural  such  separation  of  varia¬ 
bles  (such  as  in  the  second  mass  balance  model  (9) 
and  (10)  above),  it  is  then  reasonable  to  prefer  the 
model  (3),  since  the  parameters  B  and  a  arc 
uniquely  defined  by  the  model.  However,  in  other 
contexts,  this  violation  of  the  symmetry  of  the 
relationships  among  the  variables  causes  experi¬ 
menters  some  concern.  For  example,  if  it  were 
actually  the  case  that  yt  -  ( y}n,  yp>,  r  ■■  2. 
and  y}1'  -  Sy}\  and  we  chose  -  ( y}n),  yn- 
(y?\  >■!»)'  in  model  (3),  we  would  not  be  able  to 
recover  the  linear  relationship  among  the  elements 
of  y\.  Nevertheless,  it  is  always  true  that  model  (3) 
for  some  choice  of  yl2  yields  one  of  the 
permissible  (equivalent)  solutions  (fitted  models) 
for  models  (2)  and  (4). 

Observing  tits  arbitrariness  involved  in  para¬ 
meterizing  the  models  (even  model  (3)),  and  in 


contrast  the  uniqueness  of  the  hyperplane 
which  geometrically  describes  the  linear  relation¬ 
ships  among  the  elements  of  the  latent  vectors  )), 
it  is  natural  to  try  to  parameterize  directly.  One 
way  to  do  this  is  by  the  angles  6j,  j  —  1, . ,  m  —  1, 
between  the  hyperplane  J?  and  any  m  —  1  of  the 
m  axes  in  w-dimensional  space.  This  approach  is 
mentioned  in  ref.  11,  where  it  is  applied  m  the 
case  m  ®  2,  r  «  1  (a  linear  relationship  between 
two  latent  variables).  However,  generalizations  to 
general  m,  general  r,  appear  to  be  computation¬ 
ally  and  analytically  difficult.  Further,  the  angles 
$j  are  not  in  themselves  usually  of  intrinsic  inter¬ 
est. 

2.2  Identifiability 

Apart  from  questions  of  uniqueness  of  para¬ 
meterization,  there  is  also  the  problem  of  identify¬ 
ing  the  linear  relationships  from  data.  This  is 
caused  by  the  fact  that  we  do  not  directly  observe 
the  latent  variables  yit  but  instead  observe  Yt  -  y4 
-f  er  Linear  associations  (covariance)  among  the 
elements  of  the  error  vectors  c(  can  thus  be  mis¬ 
taken  for  (confounded  with)  linear  relationships 
among  the  elements  of  since  both  types  of 
association  can  result  in  covariation  between  ele¬ 
ments  of  the  observed  Y4.  Consequently,  assump¬ 
tions  about  the  form  of  the  joint  distribution  of 
the  elements  of  the  error  vectors,  ct  is  required  in 
order  to  identify  the  linear  relationships  of  interest 
(among  the  elements  of  the  latent  vectors  ;•<). 

Because  normal  distnbutions  arc  determined  by 
their  mean  vectors  and  covariance  matrices,  this 
problem  of  identifiability  always  arises  for  nor¬ 
mally  distributed  Yts.  Interestingly,  only  normal 
distributions  suffer  from  this  problem,  since  infor¬ 
mation  about  latent  linear  relationships  can  other¬ 
wise  be  obtained  from  higher  moments  or  cumu- 
lants  of  the  distribution  (12,13).  Thus,  normal 
distributions  in  measurement  error  models  play 
the  unusual  role  of  the  most  ‘nonrobust’  or 
‘worst-case’  distribution  (in  contrast  to  their 
‘best-case’  role  in  most  other  types  of  inference). 
Because  use  of  sample  higher  moments  or  cumu- 
lants  in  estimation  is  computationally  cumber¬ 
some,  adds  a  large  component  of  variability  to 
estimates,  and  also  requires  knowledge  of  which 
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moments  or  cumulants  to  use,  the  problem  of 
nonidentifiability  in  normal  distributional  cases 
(because  it  reflects  on  any  procedure  based  on 
sample  mean  vectors  and  covariance  matrices)  is 
also  relevant  even  to  situations  where  we  are  cer¬ 
tain  the  data  are  not  normally  distributed. 

Basically,  in  normal  distributional  cases,  linear 
relationships  among  the  elements  of  yt  cannot  be 
identified  (consistently  estimated)  without  knowl¬ 
edge  about  the  error  covariance  matrices 

E,=*Cov(e() 

of  the  error  vectors  er  This  knowledge  can  either 
come  from  parametric  assumptions  about  the  £4, 
or  from  independent  estimates  of  these  matrices 
obtained  from  other  experiments  (calibration  data) 
or  replications  of  I'Js  for  fixed  jjs  —  that  is, 

For  factor  analysis  models,  the  classical  as¬ 
sumption  made  is  that  the  £,$  are  all  equal  to  the 
same  diagonal  matrix.  This  diagonahty  assump¬ 
tion  is  usually  justified  by  the  belief  that  choice  of 
a  large  enough  value  of  r  (the  number  of  factors) 
removes  all  common  sources  of  variation  from  the 
errors. 

For  crrors-in-variablcs  regression  models,  a 
wide  variety  of  assumptions  about  the  E,s  have 
been  used,  and  software  packages  exist  to  fit  many 
of  these  models  (14).  The  sensitivity  of  the  result¬ 
ing  estimates  to  the  assumptions  used  is  still  an 
open  question,  although  some  information  is 
available  for  the  simple  case  r  ■  1.  Common  to  all 
of  these  assumptions  is  the  basic  requirement  that 
the  regression  slopes  of  the  elements  of  elX  on  the 
elements  el2  are  known  (15).  Here,  <?,,  contains  the 
errors  in  the  observations  YiX  of  the  latent  depen¬ 
dent  vectors  and  e2l  contains  errors  in  the 
observations  Y2l  of  the  latent  independent  vector 
yl2.  This  requirement  is  clearly  essential,  since 
otherwise  such  regression  slopes  will  be  con¬ 
founded  with  the  matrix  B  in  model  (3).  In  most 
applications  of  errors-in-variables  regression  mod¬ 
els,  the  measurements  of  YlX  and  of  Y2(  are  made 


separately,  and  it  is  reasonable  to  assume  that  the 
regression  slopes  of  the  elX  on  the  el2  are  zero. 

2.3  Structural  and  functional  models 

An  important  distinction  that  is  made  in  the 
statistical  literature  on  measurement  error  models 
is  between  models  in  which  the  latent  vectors  yt 
are  treated  as  unknown  constants  (functional 
models),  and  models  in  which  the  y,  are  assumed 
to  be  independent  random  vectors  (structural 
models).  In  the  former  case,  the  yl  are  themselves 
parameters  of  the  model.  The  fact  that  the  number 
of  such  parameters  increases  as  the  sample  size  n 
increases  causes  major  problems  for  statistical  the¬ 
ory.  For  example,  maximum  likelihood  estimators 
for  the  parameters  of  functional  measurement  er¬ 
ror  models  need  not  exist  (16,17);  or  if  they  exist, 
need  not  be  consistent.  No  completely  satisfactory 
large  sample  optimality  theory  exists  for  func¬ 
tional  measurement  error  models. 

In  contrast,  structural  measurement  error  mod¬ 
els  are  typically  parameterized  by  a  finite  number 
of  parameters.  Consequently,  classical  statistical 
theory  (e.g.,  the  theory  of  maximum  likelihood 
estimation  and  likelihood  ratio  tests)  can  be  ap¬ 
plied.  Even  so,  some  problems  remain:  com¬ 
plicated  finite  sample  distributions,  nonexistence 
of  all  moments  of  the  maximum  likelihood  estima¬ 
tor,  etc.  For  example,  in  a  strict  mathematical 
sense,  finite-length  1  -  a  confidence  intervals  for 
the  parameters  of  linear  measurement  error  mod¬ 
els  ((2),  (3)  or  (4);  structural  or  functional  cases) 
do  not  exist  (18).  Commonly  used  confidence  in¬ 
tervals  (e.g.,  large-sample  intervals)  have  arbi¬ 
trarily  small  coverage  probability  when  the  mea¬ 
surement  error  variances  are  very  large  relative  to 
the  spread  of  the  true  latent  variables.  (Sec  ref. 
18a  for  exact  results  in  the  case  r  ■■  1  of  model 
(3).)  Fortunately,  this  theoretical  result  has 
minimal  importance  in  most  physical  science  ap¬ 
plications  because  practitioners  usually  have  some 
idea  of  the  magnitudes  of  the  measurement  errors 
(and  error  variances)  in  their  experiments.  If  not, 
some  useful  checks  to  verify  that  large-sample 
confidence  intervals  have  desired  coverage  prob¬ 
ability  are  available  (see  ref.  19,  pp.  1134-1135). 
Alternatively,  the  Creasy-Fieller  method  of  con¬ 
structing  1-n  confidence  regions  (20,21)  can  be 
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used,  allhough  such  regions  will  not  always  be 
intervals. 

The  distinction  between  functional  and  struc¬ 
tural  measurement  error  models  is  similar  to  the 
distinction  between  fixed  (designed)  factors  and 
random  factors  in  the  analysis  of  variance.  In 
most  contexts  where  factor  analysis  models  are 
used,  investigators  are  willing  to  assume  that  the 
factors  f  (and  thus  the  latent  vectors  yt )  are 
random  —  for  example,  in  the  first  mass  balance 
model  discussed  above,  the  factors  f  represent 
mass  contributions  from  the  sources  and  might 
reasonably  be  assumed  to  vary  randomly  over 
time.  On  the  other  hand,  one  would  be  less  certain 
that  the  proportions  of  mass  a,j  from  the  r  sources 
would  vary  randomly  across  ‘aerosol  properties’  i 
in  the  second  mass  balance  model.  Such  latent 
variables  seem  to  be  fixed  characteristics  of  the 
‘aerosol  properties’.  Consequently,  this  second 
mass  balance  model  appears  to  be  a  functional 
measurement  error  model. 

Nevertheless,  arguments  given  in  ref.  22  show 
that  for  every  functional  model  one  can  construct 
a  similarly  parameterized  structural  model.  Using 
this  structural  model,  one  can  more  easily  de¬ 
termine  restrictions  insuring  identifiability  for  the 
key  parameters  of  both  models  (structural  and 
functional).  Further,  the  maximum  likelihood 
solution  for  the  structural  model  (which  is  the  best 
asymptotic  normal  estimator  of  the  parameters  in 
that  model)  is  typically  also  the  best  asymptotic 
normal  estimator  of  the  corresponding  parameters 
in  the  functional  model  [22,23],  Consequently,  even 
when  one  believes  that  one  has  a  functional  linear 
measurement  error  model,  it  is  worth  while  start¬ 
ing  one’s  statistical  analysis  by  studying  identifi¬ 
ability  and  choice  of  estimators  for  the  corre¬ 
sponding  structural  model.  An  additional  ad¬ 
vantage  of  adopting  structural  model  assumptions 
is  that  natural  estimators  (predictors)  of  the  latent 
variables  y,  based  on  the  observed  values  Y,  can 
be  defined.  These  are  the  conditional  expected 
values  |  F,]. 

3  ESTIMATION  AND  SOFTWARE 

A  structural  linear  measurement  error  model 
yields  a  covariance  structure  model  for  the  ob¬ 


servations  1].  Since  the  latent  variables  1\  are 
random,  and  the  model  (1)  assumes  that  the  (con¬ 
ditional)  distribution  of  e,  does  not  depend  on  y„ 
it  follows  that  <?,  and  y,  are  statistically  indepen¬ 
dent.  Thus, 

Co\'(Yt)  a  Cov (y’j)  +  Cov(e,) 

where  the  assumption  that  y,  varies  m  an  /--di¬ 
mensional  subspace  of  m-dimensional  space 
implies  that  Cov(  y;)  is  singular  of  rank  r.  For 
example,  in  the  factor  analysis  model  (2), 

Cov(r/)  =  A+A'  +  Ds  (H) 

where  t]/  is  the  (common)  covariance  matrix  of  the 
factor  vectors  /  (which  are  random  because  _>}  is 
random)  and  Ds  =  diagonal  (0„  02,...,0„)  is  the 
common  covariance  matrix  of  the  error  vectors  et. 

A  very  popular  general  computer  program  for 
fitting  multivariate  covariance  structure  models  of 
reduced  rank  is  the  program  LISREL  VI  [24).  This 
program  also  provides  estimated  large-sample 
variances  and  covariances  for  the  resulting  estima¬ 
tors,  and  tests  of  fit  for  models  of  various  ranks  r. 
There  is  a  substantial  literature  dealing  with  spe¬ 
cial  problems  connected  with  this  software  (and 
method  of  estimation).  Many  of  the  relevant 
papers  appear  in  the  journal  Psychometrika,  al¬ 
though  some  significant  papers  in  this  area  also 
have  appeared  in  such  journals  as  Biometrika, 
South  African  Statistical  Journal,  and  the  Annals  of 
Mathematical  Statistics.  Although  L1SREL  VI  as¬ 
sumes  that  the  data  vectors  Y,  arc  normally  dis¬ 
tributed,  the  large-sample  properties  of  the  estima¬ 
tors  hold  for  certain  nonnormal  distributions,  and 
methods  exist  for  adjusting  the  estimators  and 
tests  for  elliptical  distributions  with  heavier  tails 
than  the  normal  (25). 

L1SREL  VI  is  available  as  part  of  the  SPSS 
statistical  software  system,  or  as  an  independent 
program.  A  similar  program,  ISU  FACTOR  (26), 
can  be  used  with  the  SAS  statistical  software 
system. 

One  common  misconception  that  users  of  fac¬ 
tor  analysis  computer  software  programs  have  is 
that  the  sample  correlations  of  the  Y,  can  be  used 
in  place  ol  the  sample  covariances  without  effect¬ 
ing  the  estimates  (particularly  in  large  samples). 
This  is  incorrect  (3,16).  Although  use  of  sample 


52 


Chemometncs  and  Intelligent  Laboratory  Systems  ■ 


correlations  removes  the  problem  of  choice  of 
scale  for  the  data,  the  model  actually  fitted  is  not 
the  same  as  that  assumed  for  the  linear  measure¬ 
ment  error  model.  Adjusting  the  estimates  to  the 
correct  scale  does  not  correct  for  the  difference  in 
models.  Consequently,  the  typical  use  of  factor 
analysis  in  the  chemical  literature  for  the  first 
mass  balance  model  discussed  in  Section  2  docs 
not  necessarily  find  the  mass  contributions  of  the 
sources  as  specified  in  the  original  model. 

Although  the  errors-in-variables  regression 
model  (3)  in  the  structural  case  can  be  fitted  using 
LISREL  or  ISU  FACTOR,  alternative  computer 
software  exists  to  fit  this  model  directly.  First, 
there  exists  a  substantial  numerical  analysis  litera¬ 
ture  on  total  least  squares  (27,28)  dealing  with 
fitting  functional  and  structural  errors-in-variables 
regression  models  under  various  assumptions  on 
the  error  covariance  matrices  Z, m  Cov(ey).  These 
approaches  make  use  of  generalized  singular  value 
decompositions  of  the  data  matrices  Y  - 

(K, . YJ  in  place  of  the  principal-component 

type  analyses  of  the  sample  covariance  matrix  of 
the  y^s  used  by  LISREL  VI  and  other  covariance 
structure  model  software.  This  yields  greater 
numerical  stability  and  reduced  computational 
complexity  (and  time).  However,  the  range  of 
models  that  can  be  treated  by  the  new  total  least 
squares  methods  is  somewhat  limited.  Such  pro¬ 
grams  also  do  not  provide  large-sample  measures 
of  accuracy  for  the  estimators,  or  tests  of  good- 
ness-of-fit. 

Alternatively,  the  computer  program  SUPER 
CARP  (14)  can  handle  a  wide  variety  of  linear 
errors-in-variables  regression  models  of  both  func¬ 
tional  and  structural  type,  including  models  in 
which  the  error  covariance  matrices  are  heteroge¬ 
neous  (Cov(e,) »  Zt>  with  the  Z,  possibly  unequal). 
This  program  also  has  the  advantages  of  produc¬ 
ing  estimated  large  sample  variances  for  the  esti¬ 
mators,  providing  some  diagnostics  for  goodness- 
of-fit  of  the  models,  and  also  tests  of  fit.  The 
estimators  produced  by  SUPER  CARP  incorpo¬ 
rate  methods  suggested  in  Ref.  14  that  produce 
solutions  that  have  belter  performance  in  samples 
of  moderate  size  than  the  maximum  likelihood 
algorithms  (which  tend  to  occasionally  produce 


extremely  large  estimates,  or  not  to  converge  at 
all). 

A  third  very  useful  program  is  ORDPACK 
(29,30).  Although  this  program  is  somewhat  limited 
in  the  types  of  linear  measurement  error  models 
that  it  can  handle,  it  has  the  advantage  of  also 
being  able  to  fit  nonlinear  measurement  error 
models  of  the  functional  type.  It  also  incorporates 
up-to-date  numerical  analytical  optimization 
methods. 

Mention  should  also  be  made  of  robust  fitting 
methods  for  functional  and  structural  measure¬ 
ment  error  models.  These  methods,  which  use 
either  Huber’s  approach  (31)  to  robust  estimation 
or  Hampel’s  outlier-resistant  theory  (32)  based  on 
measures  of  influence  of  extreme  observations,  are 
still  under  development,  but  offer  the  promise  of 
less  sensitivity  to  outliers  and  other  deviant  mea¬ 
surements.  Some  recent  references  which  discuss 
robust  approaches  are  refs.  33-36.  In  the  chemical 
mass  balance  literature,  a  pioneering  effort  in  this 
direction  is  presented  in  ref.  37. 

My  own  recent  research  on  estimation  methods 
for  fitting  errors-in-variables  regression  models  has 
concentrated  on  a  type  of  ‘corrcction-for-attenua- 
lion’  approach  long  used  by  psychometricians 
(14,15,38).  To  discuss  this  approach  it  is  conveni¬ 
ent  to  switch  to  a  less  subscripted  notation  for 
model  (3).  Let  Yi  be  the  measurements  on  the 
latent  dependent  variables  })  and  let  Xt  be  the 
measurements  on  the  latent  predictor  variables  xt. 
Thus 

y,-Hx,  +  a,  y,-}',  +  e„  X,  “*,  +  /,  (12) 

where  xr  e,,  /,  arc  independent  of  each  other, 
£'(*',)  ”  0,  £(/,)-  0.  (Note  that  the  structural 
form  of  the  model  is  being  assumed,  however, 
recall  that  good  estimators  for  the  structural  model 
arc  also  good  estimators  for  the  corresponding 
functional  model.) 

Assume  that 

£(•0 -p.  Cov(xt)-Zx,  Co v( 

/■*  1,  2,...,r. 


■  Original  Research  Paper 


53 


and  let 

3  -£,(£,  +  £/)■'  (13) 

Then  if  the  Xi  are  normally  distributed, 

(14) 

Even  if  the  X ‘  are  not  normally  distributed,  the 
right-hand  side  of  (14)  is  the  best  linear  predictor 
of  x,  given  X,  in  the  sense  of  minimizing  the 
expected  squared-error  loss  of  prediction.  The  ma¬ 
trix  E  in  (13)  is  called  the  reliability  matnx  of  the 
measurements  Xt  of  the  latent  predictor  variables 

x, .  If  E  is  known  (or  can  be  consistently  esti¬ 
mated),  substituting  EAT, +  (/,*-  HL)X  for  xt  m 
(12)  yields  a  classical  regression  model 

y,  =  BZ(.Vi-^)  +  (<i  +  Bj)  +  <;*  (15) 

where 

«/“  B(*l-sa)-(/,-s)J)+«< 

is  uncorrelatcd  with  X,  -  X.  This  model  can  now 
be  fit  by  classical  least  squares  or  robust  regres¬ 
sion  methods  |31 )  to  yield  estimates  f',  (  of  I'  -  BZ 
and  ?  -  a  +  B,Y.  Since  Z  is  known  (or  we  have  a 
consistent  estimator  Z  of  Z),  the  equations 

i'  -  iiz,  S-«  +  b£ 

can  be  solved  for  B  and  a.  The  resulting  estima¬ 
tors  B,  a  are  then  consistent  estimators  of  B  and 
a.  In  the  normal-^,  normal-  Yt  case,  B  and  a  are 
best  asymptotically  normal  estimators  when  f*  and 
|  are  fit  by  least  squares  (or  maximum  likelihood) 
from  (15),  and  E  is  either  known  or  the  maximum 
likelihood  estimator  E  of  E  based  on  the  data 
A\,...,  X„  is  substituted  for  S  [15].  Standard  con¬ 
fidence  regions  for  the  elements  of  T  and  £  can 
easily  be  converted  to  confidence  regions  (and 
intervals)  for  the  elements  of  B  and  a.  The  method 
also  can  be  extended  to  nonlinear  errors-in- vari¬ 
ables  regression  models  [38). 

The  main  advantage  of  this  approach  is  ap¬ 
parent.  One  can  use  existing  statistical  software 


(and  confidence  region  procedures)  for  classical 
regression  to  estimate  the  parameters.  However, 
there  is  a  price  to  pay  —  one  must  know  or 
estimate  E.  As  noted  by  Gleser  [15,38],  this  re¬ 
quires  either  replications  on  the  X,  for  each  dis¬ 
tinct  xJy  or  the  use  of  independent  calibration  data 
for  the  Admeasurements.  The  latter  approach  is 
familiar  to  chemists  —  for  example,  one  can  ob¬ 
serve  the  Xts  obtained  for  known  values  of  the  xt 
in  laboratory  experiments  Just  how  one  estimates 
E  depends  upon  the  context  —  what  one  is  willing 
to  assume  about  the  relationship  of  the  calibration 
experiments  to  the  experimental  context  in  which 
the  measurements  Yif  Xt  are  obtained  Although 
extra  information  is  required  for  this  approach, 
there  is  a  welcome  bonus,  in  that  from  E  one  can 
determine  the  accuracy  of  estimation  that  can  be 
expected  in  estimating  B  and  a  (and  can  also  spot 
such  potential  problems  as  multicollinearity  in  the 
latent  variables  x}).  Constraints  of  space  do  not 
allow  further  detail,  so  individuals  interested  in 
this  approach  should  consult  Gleser  [15,38]. 


4  LINEAR  MASS  BALANCE  MODELS 

As  in  most  other  real  applications,  mass  bal¬ 
ance  models  present  the  statistician  with  a  choice 
between  the  desire  to  reflect  all  sources  of  varia¬ 
tion  and  the  need  for  parametric  simplicity.  For 
example,  both  of  the  mass  balance  models  dis¬ 
cussed  in  Section  2  assumed  that  the  mass  frac¬ 
tions  (source  compositions)  atJ  do  not  vary  over 
time.  As  Cheng  and  Hopke  [37,  p.49)  note,  this  is 
not  realistic  for  all  sources  j .  Consequently,  any 
measurements  A,f  of  the  mass  fractions  atJ  taken 
at  a  particular  time  t0  may  not  be  valid  for  other 
times  t  *  t0. 

Cheng  and  Hopke  (37,  p.49)  also  point  out  that 
mass  balance  models  are  probably  never  exactly 
correct,  since  some  mass  may  be  lost  due  to  chem¬ 
ical  reaction  along  the  path  taken  by  particles  to 
the  receptor,  while  on  the  other  hand  there  may  be 
contributions  of  mass  from  sources  not  accounted 
for  in  the  model. 
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A  model  which  reflects  the  abovementioned 
sources  of  variation  is  the  following: 

C”  *«+»« 

c„~  j, *(,(<)  (16) 

/-> 

A„  “  a//(0)  +  C;;.  1  &j£r, 

1  &t£.T 

Here,  C)t  is  the  measurement  at  the  receptor  of  the 
mass  of  property  i  at  time  /,  cn  is  the  true  mass  of 
property  i  at  time  f,  reflects  errors  of  measure¬ 
ment  in  C*,,  and  <,(/)  is  the  error  in  the  mass 
balance  equations  due  to  unidentified  sources  and 
mass  lost  to  chemical  reactions  in  transit.  Also, 
aJ;(t)  is  the  mass  fraction  of  property  /  from 
source  j  at  time  /.  This  mass  fraction  is  measured 
by  Afj  at  time  t  m  0,  with  error  ei}. 

In  this  model,  all  quantities  are  assumed  to  be 
random.  (If  al}(t)%  1  £/£m,  stays  constant  over 
times  t  for  any  source  j,  we  will  simply  assume 
that  the  variances  of 

are  zero)  Realistically,  those  quantities  indexed 
by  the  time  index  /  should  have  a  time  series 
correlation  structure.  However,  as  a  strong  sim¬ 
plifying  assumption,  we  may  assume  that  the  times 
/  at  which  observations  arc  taken  are  sufficiently 
spread  out  that  such  correlations  are  negligible, 
yet  that  the  underlying  process  is  also  sufficiently 
stable  that  we  can  assume  that  the  joint  distribu¬ 
tions  of  time-indexed  quantities  are  identical  at 
each  time  point  t,  Consequently,  we  assume  that 
the  random  matrices  ((a^(f))  arc  independently 
and  identically  distributed  (i.i.d.)  with  unknown 
mean  matrix  A  «■  ((X^)).  Similarity,  wc  assume 
that  the  vectors  s(t)  -  (j, s„)\  /  -  1 
of  mass  contributions  from  sources  l,...,r  are 
I.i.d.,  that  the  vectors  t(r)-(<1(r),..., <„(/))',  / 
7»  arc  i.i.d.  with  mean  vector  0,  and  that 

the  vectors  <o(/)-(u1( . «u)',  / »  1,..., T>  of 

measurement  errors  in  the  C„  arc  i.i.d.  with  mean 
vector  0. 

Let 

"(')  “  ((«//(')  -X,,))  “  (("/>(')))  (17) 

The  Ufj(t)  are  the  values  of  the  random  mass 
fractions  centered  at  their  means  Con¬ 


sequently,  the  w(r),  0  £  t  <.  T  are  ud  random 
matrices  with  £(«(/))  “  0. 

We  assume  that  the  «(r),  0^/^ T ,  the  s(r), 
1  ^  t  £  T,  and  the  measurement  errors  «(/)  and 
e,r  1  <,m,  1  are  mutually  statistically 

independent.  This  assumption  is  reasonable  since 
the  variations  of  the  mass  fractions  and  mass 
contributions  are  likely  to  be  unrelated  to  each 
other,  or  to  errors  made  in  measurement. 

Substitution  of  (17)  into  (16)  yields  the  follow¬ 
ing  model  for  the  observed  quantities  C„ , 


Q, m  £  + 

/-> 


««+«„+  L  uu(  >)■',, 

}- 1 


3  i. 

i- 1 

AU  m  \  +  "'/(°)  +  en  B  x,/+//>  1  ^  '  £  m< 

1  £j£r,  ,T  (18) 


If  w'e  let 

C(')-(C|, . cj.  g(i  )  =  (g„ . gm,) 

A  =  ((«//))■  /-((/./)) 

wc  can  write  (18)  in  vector-matrix  form  as 

C(/)  =  As(i)+g(i),  r=l . T  ,]9> 

A~A  +/  ' 

The  model  (19)  has  the  form  of  a  factor  analy¬ 
sis  model,  but  with  the  important  addition  of  an 
unbiased  and  independent  estimator  A  of  the  fac¬ 
tor  loading  matrix  A.  Such  a  model  has  not  previ¬ 
ously  been  considered  m  the  literature.  However, 
it  should  be  noted  that  the  error  term  #(r)  in  (19) 
docs  not  meet  the  requirements  for  classical  factor 
analysis.  To  see  this,  note  from  (18)  that  #(t)  is  a 
function  of  the  vector  s(r)  of  mass  contributions 
from  the  sources,  and  also  of  the  equation  error 
vector  <(/)«(<„,..., <„,).  Since  <(r)  reflects  both 
loss  of  mass  due  to  chemical  reaction  (which  may 
be  related  to  the  total  mass  released  into  the 
environment)  and  also  other  unidentified  sources 
of  mass  (which  may  be  correlated  with  mass  pro¬ 
duced  by  identified  sources),  any  assumption  that 
t(f)  and  s(t )  are  independent  could  be  erroneous. 
For  this  reason,  and  the  previously  mentioned  fact 
that  #(r)  is  a  function  of  s(r),  the  usual  assump- 
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lion  made  in  classical  factor  analysis  that  s(t )  and 
g(t)  are  independent  seems  to  be  excessively 
strong.  Fortunately,  the  large-sample  properties  of 
classical  factor  analysis  estimates  continue  to  hold 
under  weaker  assumptions  concerning  the  joint 
distribution  of  s(t )  and  g(t)— see  ref.  39.  Never¬ 
theless,  if  we  obtain  estimates  of  A  and  the  s(t) 
using  classical  assumptions,  it  will  be  necessary  to 
check  that  these  (or  similar)  assumptions  hold. 
Such  verification  will  have  to  be  reserved  for  later 
research 

In  the  light  of  model  (19),  both  of  the  models 
for  linear  mass  balance  mentioned  in  Section  2 
have  serious  deficiencies.  The  first  (factor  analy¬ 
sis)  model  discussed  lacks  the  identifiability  prop¬ 
erties  of  model  (19),  treats  mass  fractions  atXt)  as 
constant  over  time,  and  ignores  the  prior  knowl¬ 
edge  of  estimates  A  of  the  factor  loading  (mass 
fraction)  matrix  A  for  known  sources.  However, 
this  model  does  share  with  model  (19)  the  flexibil¬ 
ity  of  allowing  an  unknown  number  of  sources 
additional  to  those  explicitly  modeled,  and  m 
modeling  the  variation  of  the  source  mass  contri¬ 
bution  vector  s(t)  over  time. 

The  second  linear  mass  balance  model  dis¬ 
cussed  in  Section  2  has  identifiable  parameters 
and  incorporates  estimates  of  A.  Unfortunately, 
this  model  is  static  (ignores  variation  in  the  atJ(t) 
and  sJt  over  time),  and  requires  prior  knowledge 
of  the  number  r  of  sources.  It  also  makes  the  very 
strong  extra  distributional  assumption  that  (Cu, 
AiV..  ,A„y  are  independent,  /*  1,.,.,/m. 

Neither  of  the  two  models  discussed  allows  for 
random  errors  in  the  mass  balance  equation 
due  to  loss  of  mass  in  transit  by  chemical  reaction. 

Due  to  lack  of  space,  it  is  only  possible  to 
sketch  an  approach  to  estimation  of  the  parame¬ 
ters  in  model  (19).  Any  such  approach  will  require 
us  to  model  the  common  distributions  of  the  error 
vectors  g(t)  and  error  matrix  /,  particularly  the 
covariance  matrices  of  their  elements. 

My  own  favored  mode  of  approach  would  be 
Bayesian  (or  empirical  Bayesian)  based  on  recent 
work  of  Press  and  Shigemasu  (40).  These  authors 
provide  an  approximate  (in  large  samples  —  here, 
large  T)  Bayesian  approach  to  factor  analysis 
using  normality  assumptions  for  the  g(f)  and  con¬ 
jugate  priors  for  the  parameters  (A,  the  common 


mean  vector  and  covariance  matrix  of  the  j(f),  the 
common  covariance  matrix  of  the  g(t)).  Using  the 
prior  for  A,  and  the  data  A  =  (/f^),  one  can 
update  the  prior  to  form  a  posterior  for  A  given 
A.  This  posterior  distribution  can  then  play  the 
role  of  the  prior  distribution  of  A  in  Press  and 
Shigemasu’s  Bayesian  analysis.  Note  that  this  is  an 
appropriate  way  to  use  the  information  conveyed 
by  the  measurements  AtJJ  since  these  are  often  not 
really  measurements  but  instead  may  be  partly 
obtained  from  subjective  judgments  of  the  experi¬ 
menters.  Press  and  Shigemasu’s  analysis  (40J  yields 
posterior  modal  estimators  of  A  and  posterior 
mode  ‘predictors’  for  the  source  contribution  vec¬ 
tors  $(/),  as  well  as  posterior  credible  regions 
(Bayesian  confidence  regions)  for  these  quantities 
and  tests  of  fit  for  the  model  (particularly  for  the 
number  of  sources  r).  As  already  noted,  it  will  be 
necessary  to  check  whether  the  large- T  properties 
claimed  for  these  procedures  continue  to  hold 
under  the  violations  of  classical  factor  analysis 
assumptions  which  we  have  noted  in  the  model 
(19). 


5  CONCLUDING  REMARKS 

A  subject  as  vast  and  varied  as  that  of  linear 
measurement  error  models  cannot  possibly  be 
covered  in  a  single  survey  paper.  For  this  reason, 
the  comprehensive  surveys  in  the  books  of  Fuller 
(14)  and  Kendall  and  Stuart  (41)  are  highly  recom¬ 
mended.  The  present  paper  has  highlighted  com¬ 
mon  models,  themes  and  problems  in  the  mea¬ 
surement  error  literature  in  the  hope  that  this  brief 
introduction  will  help  chemists  gain  access  to  that 
literature  for  use  m  their  own  research.  The  mod¬ 
eling  and  treatment  of  measurement  (and  equa¬ 
tion)  errors  is  a  fundamental  problem  in  the  statis¬ 
tical  analysis  of  physical  data  which  must  be  prop¬ 
erly  addressed  if  conclusions  reached  by  scientists 
are  to  be  valid.  Although  the  problems  that  arise 
are  analytical  difficult,  they  are  unavoidable.  For¬ 
tunately,  some  of  the  best  minds  in  science  have 
addressed  these  problems  over  the  last  fifty  years, 
and  there  are  many  useful  methods  available  to 
practitioners.  In  the  context  of  linear  mass  balance 
models,  the  strengths  and  weaknesses  of  two  of 
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these  approaches  have  been  mentioned  and  a  new 
model  incorporating  their  strengths  (in  a  modeling 
sense)  has  been  proposed.  It  is  hoped  that  further 
research  on  this  and  similar  models  will  yield 
improvements  on  methods  currently  used  to 
analyze  data  based  on  linear  mass  balance  models. 
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INTRODUCTION 

Professor  Gleser  has  provided  an  exquisite 
overview  and  integration  of  the  error  structure 
and  statistical  modeling  that  may  be  employed  to 
characterize  the  results  of  modern,  multivariable 
chemical  metrology.  His  demonstration  of  the 
equivalence  of  three  representations  of  the  linear, 
multivariate  statistical  relationship  —  as  factor 
analysis  (Gleser’s  eq.  (2)),  errors-in-variables  re¬ 
gression  (eq.  (3)),  and  implicit  functional  (cq.  (4)) 
models  —  is  especially  satisfying,  in  that  it  makes 
plain  the  fact  that  we  may  approach  linear  models 
in  chemistry  from  apparently  different,  yet  intrin¬ 
sically  equivalent  perspectives.  His  ‘new'  model 
(cq.  (19))  for  treating  the  inevitable  nonlinearities 
or  unsatisfied  assumptions  in  real  chemical  experi¬ 
ments  should  prove  particularly  interesting  to 
those  involved  in  difficult  environmental  and  field 
studies.  Finally,  the  essential  difference  between 
structural  and  functional  models  reveals  a  basic 
dichotomy:  that  in  the  physical  sciences  we  gener¬ 
ally  find  causal  (functional)  relationships,  often 
involving  fixed  latent  variables,  yet  the  statistical 
estimation  procedures  that  wc  must  use  arc  ‘satis¬ 
factory’  (in  terms  of  existence  and  consistency)  for 
the  multivariate  structural  models.  Resolution  is 
promised,  however,  through  the  asymptotic  behav¬ 
ior  of  the  estimators. 

The  relevance  of  Professor  Gleser's  essay  to 
chemical  metrology  follows  from  the  facts  that  all 
of  the  chemical  variables  that  we  measure  are 
subject  to  error,  and  at  a  rapidly  increasing  pace 


our  .measurement  systems  are  producing  high  di¬ 
mensional  data.  Except  for  defined  standards,  the 
‘error-free’  independent  variables  of  classical  uni¬ 
variate  chemistry  are,  in  fact,  simply  unattainable 
asymptotes  covered  by  the  more  general  linear 
measurement  error  models.  Furthermore,  the  divi¬ 
sion  into  dependent  and  independent  classes  be¬ 
comes  increasingly  problematic  as  the  number  of 
variables  increases. 

In  Gleser’s  paper  we  have  been  given  a  funda¬ 
mental  overview  of  statistical  issues  and  statistical 
references.  In  keeping  with  the  spirit  of  chem¬ 
ometrics,  I  shall  attempt  to  complement  that  with 
some  chemical  approaches,  assumptions,  and  ref¬ 
erences. 


THE  METROLOGICAL  CONTEXT 

As  noted  above,  effectively  all  of  our  metro¬ 
logical  parameters  must  be  viewed  as  estimates, 
complete  with  error  (generally  random  and  sys¬ 
tematic).  Certain  characteristics  of  metrology  in 
the  physical  science^,  however,  have  important 
implications  for  the  measurement  error  models 
discussed  by  Gleser.  The  most  important  of  these 
are:  (1)  theoretical  and/or  controlled,  laboratory- 
based  estimates  for  the  error-covariance  matrix; 
and  (2)  multiple  levels  of  measurement,  where 
estimated  quantities  (latent  variables)  may  be  more 
and  more  remote  from  the  directly  observed  sig- 
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nals  of  chemical  sensors.  Point-1,  already  alluded 
to  by  Gleser,  means  that  in  many  cases  the  vari¬ 
ances  and  correlations  may  be  precisely  estimated 
from  physical  theory  or  presumed  distribution 
functions  (e.g.,  Poisson),  or  they  may  be  derived 
from  extensive,  controlled  laboratory  evaluations. 
That  is,  we  may  often  supply  the  covariance  ma¬ 
trix  at  the  outset,  rather  than  estimating  it  with 
the  linear  model  that  we  are  fitting.  The  second 
point  is  illustrated  by  the  following  metrological 
level-diagram: 


Level 

Variable 

Realization 

1 

y :  instrumental 

direct  observation 

signal 

(sensor  response) 

2 

x:  species-x  con- 

calibration,  decon- 

centration 

volution  of  y 

3 

0: source  strength/ 

calibration,  decon- 

system  property 

-  volution  of  x 

The  essential  point  is  that  the  ‘measured  quanti¬ 
ties’  appearing  as  parameters  in  the  linear  mea¬ 
surement  error  models  may  themselves  be  the 
product  of  modeling.  As  we  move  from  level-1 
toward  level-3,  the  measurements  become  more 
and  more  indirect.  For  example,  we  can  never 
directly  observe  the  concentration  (x)  of  a  chem¬ 
ical  substance;  we  must  compute  it  from  a  calibra¬ 
tion  model  and  the  response  of  a  chemical  sensor. 
Similarly,  we  cannot  directly  observe  the  strength 
of  a  pollutant  source  (0)  at  a  receptor  site;  we 
must  compute  it  from  the  computed  chemical 
concentration  vector  or  matrix  (x)  obtained  at 
that  site. 

The  importance  of  the  multiple  levels  of 
metrology  to  the  application  of  measurement  error 
models  is  that  the  associated  deconvolution  mod¬ 
eling  of  signals  and  concentrations  can  lead  to 
model  error-  (missing  components,  systematic 
model/ parameter  error,  ...)  as  well  as  correlated 
estimates.  Further  comment  on  this  matter  will  be 
given  under  the  subheadings  of  factor  analysis  and 
measurement  refinement. 


FACTOR  ANALYSIS 

Factor  analysis  (FA)  is  employed  in  the  physi¬ 
cal  sciences  in  at  least  three  different  ways.  As 


with  cluster  analysis,  it  can  serve  as  a  very  useful 
exploratory  tool,  particularly  in  its  graphical  mode, 
to  make  inferences  (or  conjectures)  concerning 
concealed  relationships  m  multivariable  chemical 
systems  [11  Using  principal  components  projec¬ 
tions,  one  can  obtain  rather  efficient  visualization 
of  high  dimensional  space,  and  draw  inferences 
concerning  clusters  and/or  classes  of  objects, 
lower  dimensional  (lines,  planes)  mixture  relations 
among  end  member  classes,  important  non-linear¬ 
ities,  and  possible  outliers  and/or  ‘unusual’  sam¬ 
ples.  Beyond  pure  visualization,  one  may  seek  to 
simplify  the  representation  by  removing  factors 
(components)  that  appear  to  derive  largely  from 
noise,  or  perform  some  simple  rotations  to  inspire 
chemical  insight.  These  applications  of  FA  can  be 
extremely  powerful  when  linked  with  the  well- 
trained  eye  or  the  inspired  scientific  mind.  They 
are  replete  with  pitfalls,  if  employed  as  automatic 
routines. 

A  second  application  of  factor  analysis  is  to 
provide  an  empirical,  linear  approximation  of  the 
multivariate  structure  of  a  chemical  class.  Such 
‘class  modeling’,  based  on  the  first  few  principal 
components  of  a  class  of  ‘similar’  chemical  mem¬ 
bers,  commonly  known  as  ‘soft  modeling’,  has 
become  one  of  the  major  descriptive  and  dis¬ 
criminating  tools  for  chemical  classification  and 
pattern  recognition  studies  (2.3). 

The  third  role  for  factor  analysts  is  for  linear 
functional  modeling.  Casual  use  is  ruled  out  in 
this  case.  Assumptions  and  parameterization  must 
be  recognized  —  viz.,  we  are  explicitly  treating  the 
model 

ymx,  Abe  (1) 

where  y  is  the  matrix  (r  x  /  )  of  responses  for  a 
given  set  of  variables;  x  is  the  matrix  (/Xy)  of 
pure  component  concentrations;  A  is  a  design  or 
chemical  profile  matrix  (/Xi),  reflecting  normal¬ 
ized  responses  or  spectra  of  pure  components;  and 
e  is  the  measurement  error  matrix  (/  X  *).  (Eq.  (1) 
is  the  transpose  of  Gleser’s  FA  equation;  it  fol¬ 
lows  the  convention  of  putting  ‘objects’  or  sam¬ 
ples  by  the  rows  of  y  (4|.)  The  fundamental  chem¬ 
ical  factor  analytic  issue  is  that  eq.  (1)  represents  a 
linear  functional  relationship,  it  is  not  an  eigen¬ 
vector  equation.  In  other  words,  the  factor  score 
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matrix  x  has  meaning  in  terms  of  chemical  com¬ 
ponents,  having  chemically  charactenstic  spectra 
(or  fingerprints  or  profiles)  represented  by  the 
loading  matrix  A.  Thus,  although  FA  should  lead 
to  the  proper  estimate  for  the  number  of  linearly 
independent  (estimable)  chemical  components,  ad 
hoc  manipulations  such  as  VARIMAX  cannot  in 
general  be  expected  to  produce  chemically  correct 
loadings.  (For  one  thing,  chemical  profiles  are 
rarely  orthogonal.) 

It  should  be  noted  that  eq.  (1)  is  employed 
broadly,  not  only  in  the  area  of  environmental 
source  apportionment  (‘mass  balance’  as  used  by 
Gleser),  but  also  in  the  chemical  laboratory,  where 
the  xs  represent  concentrations  of  chemical  com¬ 
ponents  of  the  system  being  analyzed.  These  two 
types  of  application  reflect  levels-3  and  -2  respec¬ 
tively  of  the  chemical  metrology  level  structure 
presented  earlier.  In  both  cases,  residual  variance 
may  be  employed  to  estimate  measurement  error, 
or  to  test  presumed  measurement  error. 

Several  issues  related  to  the  validity  and  appli¬ 
cation  of  eq  (1)  deserve  exposure.  First,  is  the 
number  of  linearly  independent  components,  r. 
Unfortunately,  r  is  rarely  known,  except  in  the 
case  of  single  components  or  fully  isolated  compo¬ 
nents  (as  in  high  resolution  spectrometry  or  chro¬ 
matography).  One  of  the  most  important  func¬ 
tions  of  FA,  therefore,  is  to  make  possible  an 
estimate  of  r,  given  an  appropriate  data  matrix.  A 
number  of  magic  rules  exist  to  produce  such 
estimates.  One  of  the  more  reliable  approaches 
appears  to  be  an  F-tcst,  as  outlined  by  Malinow¬ 
ski  (5),  subject  to  the  constraints  that  the  errors  be 
homogeneous  (constant  vanance  over  all  factors) 
and  uncorrelatcd.  Starting  with  the  least  signifi¬ 
cant  principal  component,  error  eigenvalues  are 
tested  sequentially  for  statistical  significance.  A 
second  issue,  also  treated  in  ref.  5,  relates  to  the 
testing  of  possible  target  vectors  (columns  of  A 
matrix)  for  significance,  given  the  ‘abstract  factor 
space’  deriving  from  principal  component  analysis 
(PCA).  Malinowski  observes  that  this  procedure 
“brings  target  factor  analysis  from  the  quagmire 
of  heuristic  reasoning  to  the  realm  of  statistical 
inference.” 

Target  factor  analysis  (6)  is  one  of  the  ap¬ 
proaches  for  deriving  chemically  meaningful  fac¬ 


tors  for  use  with  eq.  (1).  It  speaks  to  the  second 
issue,  namely  model  uniqueness,  using  Gleser’s 
terminology.  Among  other  recommended  ap¬ 
proaches,  perhaps  the  most  famous  is  that  of  ‘Self 
modeling  curve  resolution’,  invented  by  Lawton 
and  Sylvestre  (7)  This  technique  was  developed 
for  two-component  systems,  and  it  works  well  if 
the  samples  reasonably  span  the  factor  space  The 
extreme  samples  set  inner  limits  for  the  unknown 
spectra  or  profiles,  and  non-negativity  constraints 
set  outer  limits.  If  unique  variables  exist  for  each 
of  the  chemical  components,  then  spectra  rather 
than  spectral  bands  may  be  estimated.  Other 
workers  later  extended  the  Lawton  and  Syivestre 
approach  to  three  (8J  or  more  components  (9). 
Uncertainties  for  estimated  end  member  (isolated) 
spectra  have  been  derived  by  the  error  propa¬ 
gation  technique  of  Roscoe  and  Hopke  [10,1 1). 
Other  means  for  deriving  chemical  factors  take 
into  account  clustering  of  loadings  using  the  van¬ 
ance  diagram  technique  (12J,  incorporate  physi¬ 
cochemical  modeling  (13),  and  compare  derived 
FA  spectral  windows  with  spectrochemical  data 
bases  (14).  For  an  excellent  review  of  the  several 
approaches  to  ‘mixture  (factor)  analysis’  see  Gem- 
perline  (15,16). 

The  question  of  finding  mutually  exclusive,  fac¬ 
tor-specific  (unique)  variables  is  closely  related  to 
the  ‘MLR(T)’  technique.  Here,  one  designs  the 
measurement  process  to  contain  as  many  unique 
tracers  as  possible.  Multiple  Linear  Regression  on 
the  Tiacer  species  then  produces  spectral  or  pro¬ 
file  estimates  for  the  corresponding  sources.  This 
has  been  especially  useful  in  sorting  out  the  infor¬ 
mation  contained  in  environmental  (mass  balance) 
data  matrices  (17-19). 

The  third  issue.  Almost  without  exception,  ex¬ 
perts  with  chemical  factor  analysis  (as  embodied 
in  eq,  (1))  recommend  avoiding  standardization  of 
the  data  matrix  prior  to  factor  analysis.  This  is  in 
keeping  with  the  assumption  of  error  homogene¬ 
ity,  and  Gleser’s  comment  (Section  3)  regarding 
misuse  of  the  sample  correlation  matrix.  On  the 
other  hand,  if  variables  are  measured  on  quite 
different  scales,  or  exhibit  quite  different  measure¬ 
ment  errors,  then  initial  ‘scaling’  (standardization) 
is  recommended  (20).  That  means  use  of  a  correla¬ 
tion  matrix.  Quoting  Mellinger  (21),  “the  covan- 
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ance- variance  matrix  may  be  used...  only  when 
the  variables  have  essentially  equal  variances.”  An 
interesting  discussion  of  the  four  alternatives  — 
centering  or  not,  scaling  or  not  —  is  given  by 
Malinowski  and  Howery  [6]  A  device  to  use 
standard  FA  software  (which  centers  data)  for  FA 
about  the  origin,  for  environmental  source  appor¬ 
tionment,  was  developed  by  Thurston  and  Spen- 
glcr  using  a  fictional  null  vector  [22] 

Not  unrelated  to  the  question  of  scaling  is  the 
fourih  issue,  data  matrix  weighting.  As  noted  by 
Gleser  in  his  discussion  of  identifiability,  classical 
FA  models  treat  the  error-covariance  matrices  as 
though  they  were  equal  to  the  same  diagonal  ma¬ 
trix,  independent  of  sample.  This  assumption  gen¬ 
erally  does  not  hold  in  chemical  applications,  for 
several  reasons.  The  primary  reason  is  that  chem¬ 
ical  measurement  error  usually  increases  with  in¬ 
creasing  concentration;  and  the  concentration  of  a 
given  element  (chemical  variable)  may  vary  widely 
depending  on  both  the  relative  and  absolute 
amounts  of  the  predominant  components  in  a 
given  sample.  A  log  transform  might  help,  when 
the  relative  standard  deviation  is  fixed.  A  weighted 
FA  solution  to  the  problem  has  been  offered  by 
Cochran  and  Home  (23J,  where  the  variance  for 
data  matrix  element  yu  is  treated  as  a  product 
function  characteristic  of  row-/  and  column-/. 
These  authors  demonstrated  that  classical  PCA, 
which  ignores  this  row-column  dependence  of  the 
variance,  leads  to  incorrect  results. 

The  fifth  issue  relates  more  specifically  to 
identifiability  —  i.e.,  the  confounding  of  covari¬ 
ance  among  chemical  components,  with  that  asso¬ 
ciated  with  their  measurement  errors.  The  prob¬ 
lem  derives  from  the  fact  that  chemical  concentra¬ 
tions  (level-2  in  the  metrological  level  diagram) 
are  often  estimated  from  a  least  squares  fit  to 
overlapping  signals  from  level-1.  This  happens  for 
example  in  the  deconvolution  of  a  gamma  ray 
multi*>~t,  and  in  corrections  for  mutual  inter¬ 
ference  in  optical  or  X-ray  spectrometry.  Thus  the 
error-covariance  matrix  for  the  response  data  ma¬ 
trix  used  in  FA  is  not  necessarily  diagonal.  Per¬ 
haps  methods  exist  for  treating  known  off-diago¬ 
nal  elements  in  FA,  but  untreated,  they  will  con¬ 
found  the  component  estimates.  Further  com¬ 
ments  on  this  issue  will  be  given  in  the  section  on 
measurement  refinement. 


Sixth,  and  last,  is  the  matter  of  random  sam¬ 
pling.  In  Section  4  of  his  paper,  Gleser  observes 
that  all  of  the  quantities  in  the  linear  mass  balance 
models,  though  containing  a  time  index,  are  as¬ 
sumed  to  be  random,  that  time  series  correlation 
should  be  made  negligible  by  the  sampling 
strategy.  This  may  be  possible  in  a  number  of 
instances,  but  in  many  chemical  experiments  time 
(and  space)  variations  of  chemical  component  in¬ 
tensities  are  turned  into  an  advantage.  One  illus¬ 
tration  is  found  in  chromatography,  and  the  rela¬ 
tively  new  technique  of  evolutionary  factor  analy¬ 
sis  (24).  Here,  cyclic  appearances  and  disap¬ 
pearances  of  components  in  time-partitioned  data 
matrices  are  detected  as  periodically  changing 
numbers  (/■)  of  chemically  significant  principal 
components.  The  time  sequence  of  changes  in  the 
number  of  significant  components  serves  as  the 
first  step  in  identification  of  species  that  have 
different  chromatographic  elution  times.  Clearly, 
analogous  temporal  phenomena  are  associated 
with  the  transport  of  atmospheric  species;  so 
evolutionary  factor  analysis  could  become  a  very 
important  part  of  linear  mass  balance  modeling. 


errors-in-variables  regression 

Gleser’s  ‘new’  model  (his  eq.  (19))  serves  as  an 
excellent  conjunction  linking  the  discussion  of  FA 
and  crrors-in-variables  regression  (EVAR),  for  it 
promises  incorporation  of  the  best  features  of 
each,  while  compensating  for  some  common  de¬ 
ficiencies.  Of  special  interest  is  the  utilization  of 
both  the  full  sample  data  matrix  and  prior  esti¬ 
mates  of  the  factor  loading  matrix  (chemical  spec¬ 
tra  or  profiles).  Classical  FA  ignores  this  prior 
information,  while  classical  EVAR  treats  data  from 
only  one  sample  at  a  time.  At  the  Quail  Roost-II 
Workshop  on  Receptor  Modeling  via  Chemical 
Mass  Balance  and  Factor  Analysis  Models,  some 
creative  attempts  were  made  to  incorporate  these 
two  types  of  information,  but  no  generally  satis¬ 
factory  solution  was  put  forth  [25 J.  Later  analyses, 
based  on  the  same  data  sets,  showed  further  crea¬ 
tive  appioaches,  such  as  linear  programming  (LP) 
and  partial  least  squares  (PLS)  (26-28).  The  PLS 
solution,  in  fact  was  a  two-block  factor  analytic 
technique  that  related  the  principal  eigenvectors 
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of  the  source  profile  matrix  to  those  of  the  sample 
data  matrix  —  i.e.,  as  with  Gleser’s  new  model,  it 
utilized  all  of  the  samples  together  with  prior 
estimates  of  the  source  profiles  Comments  on 
other  advantages  of  Gleser’s  new  model  will  ap¬ 
pear  in  the  next  section. 

Returning  to  ‘single  sample’  EVAR,  it  is 
noteworthy  that  the  maximum  likelihood  estima¬ 
tion  (MLE)  for  two- variable  chemical  problems 
has  long  been  recognized  as  important.  MLE  has 
been  employed  especially  in  intercahbrations  in¬ 
volving  two  measured  variables  and  in  intercom¬ 
parisons  involving  two  laboratories.  Both  biased 
and  unbiased  methods  for  incorporating  the  con¬ 
comitant  ‘errors-in-*’  are  found  in  the  chemical 
literature  (29,30).  Multivariate  manifestations  are 
found  in  the  areas  of  multicomponent  gamma  ray 
spectrometry  and  multicomponent  source  appor¬ 
tionment  (chemical  mass  balance  modeling) 
(31,32). 

Because  of  the  importance  of  this  topic  in  mod¬ 
em  environmental  and  analytical  chemistry,  Beebe 
and  Currie  undertook  an  empirical  evaluation  of 
popular  algorithms/ software  for  treating  the 
problem  (33).  Specifically,  the  methods  mentioned 
in  Gleser’s  paper,  effective  variance  weighted  least 
squares  (EVWLS),  orthogonal  distance  regression 
(ODR)  (34)  and  the  MLE  (structural  model) 
method  of  Fuller  (35),  were  tested  with  bi-  and 
trivariate  data  sets  having  known  structure.  De¬ 
tails  will  be  found  in  ref.  33,  but  two  of  the 
essential  conclusions  were  that  ODR  was  rela¬ 
tively  less  precise,  but  unbiased,  while  EVWLS 
gave  accurate  precision  estimates,  and  was  as  pre¬ 
cise  as  MLE,  but  biased.  This  was  surprising, 
because  the  formulation  of  EVWLS  in  ref.  32 
seemed  equivalent  to  MLE  On  further  reading, 
however,  one  finds  an  approximation  that, makes 
its  implementation  equivalent  to  iteratively 
weighted  least  squares  (IWLS)  which  is  known  to 
produce  biased  estimates  (29).  This  is  a  rather 
serious  discovery,  for  EVWLS  is  the  currently 
accepted  method  for  chemical  mass  balance  (re¬ 
gression)  calculations. 

Gleser’s  proposals  for  correcting  for  attenua¬ 
tion  (bias)  are  especially  welcome,  given  the  fore¬ 
going  observation.  The  reliability  matrix  (his  eq. 
(13))  and  the  expanded  regression  model  error 


(below  eq.  (15))  hold  the  key.  This  very  facile 
solution  to  an  important  class  of  chemical  prob¬ 
lems  is  all  the  more  practicable,  because  it  can  be 
applied  using  standard  linear  regression  software. 
The  ‘price’  we  must  pay,  estimation  of  the  reliabil¬ 
ity  matrix,  is  not  unreasonable.  As  Gleser  shows 
in  ref.  3 6,  the  covariances  comprising  the  reliabil¬ 
ity  matrix  come  directly  from:  (a)  the  set  of  ob¬ 
served  variable  values  (2X),  and  (b)  the  difference 
(2^-2^)  where  2^  represents  the  covariance  ma¬ 
trix  of  measurement  errors  These  latter  are  the 
same  errors  (variances)  we  now  employ  in  EVWLS 
and  IWLS;  they  may  be  estimated  through  repli¬ 
cation  or  ‘  theory’. 


MEASUREMENT  REFINEMENT 

In  the  last  parts  of  this  discussion  I  should  like 
to  comment  on  aspects  on  the  problem  where  the 
chemist  can  make  his  most  important  contribu¬ 
tions,  given  the  insights  concerning  measurement 
error  models  provided  by  the  mathematician- 
statistician  This  represents  the  synergism  which  is 
the  true  benefit  of  cross-disciplinary  research.  By 
refining  the  measurement  process,  the  chemist  can 
reduce  or  eliminate  errors  associated  with  multi- 
collmearity,  identifiabihty,  and  certainly  model 
uniqueness.  By  model  refinement,  using  known 
physicochemical  relationships,  otherwise  erro¬ 
neous,  linear  model  assumptions  may  be  averted. 

Perhaps  the  most  obvious  measurement  refine¬ 
ment  relates  to  the  relative  magnitudes  of  the 
measurement  errors  across  species  and/or  sam¬ 
ples.  (Reducing  the  absolute  magnitudes  of  the 
measurement  errors,  of  course,  always  helps;  this 
should  be  done  to  the  extent  feasible.)  Planning 
measurements  to  control  the  relative  magnitudes 
of  measurement  errors  is  interesting  because  it  can 
influence  multicollineanty.  For  example,  the  ma¬ 
trix  to  be  inverted  in  weighted  regression  analysis 
is  A'WA,  where  A  is  the  design  matrix  and  W  is 
the  diagonal  matrix  of  weights  (inverse  vanances). 
Altering  the  relative  weights  thus  alters  the  ‘condi¬ 
tion’  of  this  critical  matrix  of  linear  regression.  In 
fact,  an  optimum  may  be  achieved  by  maximizing 
the  determinant  of  this  matrix,  the  Fisher  infor¬ 
mation  (18).  Chemical  insight  is  related  to  this 
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issue  in  two  ways:  deciding  which  variables  are 
most  important  for  increased  weight  (depends  on 
the  mix  of  likely  source  components),  and  decid¬ 
ing  how  to  accomplish  the  measurement  task. 
When  weights  depend  on  signal  magnitude,  as 
they  often  do  in  chemical  measurements,  then 
iteration  is  necessary  to  take  into  account  the  y 
dependence.  The  basic  question  is  one  of  iterative, 
intelligent  design  of  the  chemical  measurement 
process. 

Closely  related  is  the  issue  of  chemical  inter¬ 
ference  and  the  corresponding  off-diagonal  ele¬ 
ments  of  the  sample  covariance  matrix.  This  is  a 
very  rea*  issue  for  overlapping  spectra  or  chro¬ 
matographic  peaks  in  laboratory  analysis,  and  it 
has  important  consequences  for  environmental 
mass  balance  studies  where  lcvel-2  metrological 
data  (estimated  cnemical  concentrations)  are  em¬ 
ployed  in  lcvcl-3  models.  Covariance  among  con¬ 
centration  estimates  must  be  avoided  for  classical 
FA;  quoting  Anderson;  “an  essential  assumption 
is  that  the  (error!  covarm<.c  matrix  is  diagonal” 
(37).  To  achieve  lias  *.osts  money.  To  illustrate  the 
point,  in  air  particulate  receptor  modeling  it  is 
common  to  measure  a  host  of  element  concentra¬ 
tions  using  X-ray  fluorescence  analysis  (XRF). 
The  method  is  inexpensive  (ca.  $40/sample)  but 
insensitive  for  certain  elements  (e.g.,  those  with 
low  atomic  number,  such  as  carbon,  boron),  and 
exhibits  interferences  for  others  (c.g.,  lead  L-X 
rays  interfere  with  arsenic  X-X  rays).  Correction 
for  interference,  often  done  by  regression  tech¬ 
niques,  necessarily  induces  covanancc  between  the 
estimated  (corrected)  concentrations.  A  more  ex¬ 
pensive  technique  (by  a  factor  of  three  to  five), 
neutron  activation  analysis  (NAA),  will  often 
overcome  both  limitations,  though  special  inter¬ 
ferences  (dependent  on  nuclear  properties)  may 
occur  here.  Unique  tracer  techniques  generally 
cost  even  more,  but  they  may  eliminate  collincar- 
jty  among  certain  sources;  and  often  the  special¬ 
ized,  single  species  measurement  process  has  no 
interspccics  interference.  The  price  is  higher.  A 
case  in  point  is  ,4C,  which  we  measure  to  unam¬ 
biguously  resolve  fossil  from  biospheric  carbon 
sources  (cost:  ca.  two  to  five  times  that  of  NAA). 

Use  of  UC  illustrates  measurement  refinement 
by  paying  attention  to  the  chemical  question  con¬ 
cerning  what  to  measure.  By  employing  a  (racer  of 


this  sort  that  is  both  unique  and  absolute,  one  can 
accomplish  other  ends.  Namely,  inexpensive 
(XRF)  unique  tracers  (mineral-corrected  potas¬ 
sium,  lead)  that  are  not  absolute  can  be  calibrated, 
thus  achieving  reliability  for  a  given  airshed,  but 
at  reduced  cost  (18,38).  Reliable  (orthogonal) 
tracers  can  also  be  added  to  the  design  of  the 
overall  experiment.  An  example  is  a  recent  EPA 
sponsored  study  of  carbonaceous  aerosol  sources 
in  Roanoke,  VA,  U.S.A.  Here,  I4C  was  employed 
in  the  validation/ calibration  mode  discussed 
above;  this  step  resolved  wood-burning  carbon 
from  fossil  carbon  in  the  atmosphere.  As  a  second 
step,  stable  rare  earth  isotopes  were  purposely 
added  to  label  fuel  oil  in  the  area.  Their  signatures 
provided  added  ‘orthogonal’  resolution  of  this 
component  of  the  atmospheric  soot  from  the  fossil 
component  from  motor  vehicles  (39).  A  statement 
by  Rao  marvelously  supports  the  philosophy  of 
such  approaches  to  measurement  refinement  in 
quite  another  field:  “Possibly  what  is  wrong  with 
the  economists  is  that  they  arc  not  trying  to  refine 
their  measurements  or  trying  to  measure  new  vari¬ 
ables  which  cause  economic  change.  That  is  far 
more  important  that  dabbling  with  whatever  data 
are  available  and  trying  to  make  predictions  based 
on  them”  (40). 


MODEL  REFINEMENT 

Not  far  removed  is  the  subject  of  model  refine¬ 
ment.  Gleser’s  proposed  model  (cq.  (19))  speaks  to 
this.  As  recognized  also  by  Cheng  and  Hopkc  (26), 
real  receptor  models  are  not  linear.  There  are 
selective  changes  in  particle  composition  during 
transport,  including  physical  effects  (agglomera¬ 
tion,  settling)  and  chemical  effects  (reaction).  1 
believe  that  the  most  effective  way  to  account  for 
such  nonlinearities  is  to  employ  carefully  con¬ 
structed  physicochemical  models  of  the  respective 
processes.  The  alternative,  which  will  not  be  fur¬ 
ther  discussed  here,  is  to  use  chemical  knowledge 
and  data  to  select  those  species  that  are  ‘chem¬ 
ically  robust’  —  i.e.,  conservative  (linear)  tracers 
that  resist  change,  isotopes  and  nonreactive  gases 
being  classic  examples.  Physicochemical  modeling 
for  source  apportionment  has  been  dubbed  ‘hy¬ 
brid  modeling’.  Examples  are  seen  in  the  use  of 
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reaction  rate  constants  to  help  model  the  gas-to- 
particle  conversion  of  sulfur  dioxide  to  sulfate  [41] 
and  the  selective  oxidation  of  polycyclic  aromatic 
hydrocarbons  during  atmospheric  transport  (42). 
An  interesting  statistical  challenge,  for  better  rep¬ 
resenting  ‘real’  behavior,  would  be  to  describe 
Individual  source  profiles  (columns  of  the  A  ma¬ 
trix,  eq  (1))  as  empirical  principal  component 
class  models  [2]  to  serve  as  prior  information  for 
source  apportionment  by  FA  and  EVAR. 

Model  refinement  can  be  considered  in  a  larger, 
mote  generic  sense.  Realizing  that  our  models  are 
imperfect  ‘cartoons’  or  caricatures  of  reality  gen¬ 
erally  emphasizing  (distorting?)  particular  per¬ 
spectives  or  parameters,  it  is  meaningful  to  con¬ 
sider  classes  of  models,  having  varying  degrees  of 
refinement  (and  corresponding  increases  in  cost). 
In  atmospheric  chemistry,  for  example,  we  may 
look  beyond  the  relatively  simple  hybrid  models 
mentioned  above,  to  two  and  three  dimensional 
(spatial)  models  of  the  temporal  processes  taking 
place  Such  ‘full  dynamic  modeling’  relies  heavily 
on  highest  quality  numerical  methods,  plus  statis¬ 
tics,  but  it  must  be  fundamentally  based  on  sound, 
detailed  physical  and  chemical  analysis  of  the 
system.  Pertinent  illustrations  of  such  model 
classes  are  given  in  Table  1,  for  atmospheric  chem¬ 
istry  together  with  two  other  fields  of  endeavor. 
This  viewpoint  was  presented  for  atmospheric 
modeling  in  ref.  43,  it  was  inspired  b)  Hofstadter 

(441- 

Considerable  insight  into  the  relation  between 
model  realism  and  viewpoint,  and  metrological 
accuracy,  can  be  gained  by  examining  the  cvolu- 


TABLE  1 
Model  refinement 


Music  (44] 

Atmospheric  science 

Oceanography 

Muzak 

Linear  models  [19] 

2-box  [45.46] 

(conservative  tracer) 

(above/below 

thermocline) 

Jazz 

Hybrid  {26} 

Box-diffusion  [47] 

(S04(411.PAI!(42]) 

(surface,  deep  ocean) 

Classical 

1.2. 3D  dynamic. 

‘Pandora*  [49] 

music 

reacting  system 

(multicompartment/ 

1 

reality 

flows) 

tion  of  oceanographic  models  Like  atmospheric 
models,  they  have  been  designed  to  describe  the 
state  of  the  fluid  ystem,  including  concentrations 
and  transport  of- chemical  constituents.  In  both 
areas  of  environmental  science,  the  simplest  mod¬ 
els  frequently  serve  quite  well  for  estimation  and 
prediction  of  a  limited  set  of  parameters.  In  oc¬ 
eanography,  one  of  the  driving  forces  has  been  the 
need  to  understand  the  effect  of  anthropogenic 
carbon  dioxide  perturbations  on  the  atmosphere- 
ocean  system  —  a  central  problem  for  forecasts  of 
global  warming.  The  earliest  models  simply  treated 
the  spatially  averaged  atmosphere  and  world  oc¬ 
eans  as  two  or  three  reservoirs  [45,46].  Far  more 
realistic  is  the  box-diffusion  model  for  vertical 
transport  in  the  ocean,  which  treats  the  upper 
layer  as  well  mixed  and  describes  the  ocean  below 
the  thermocline  as  an  infinite  set  of  boxes  —  i.c., 
as  a  diffusive  medium  [47].  This  model,  which  still 
describes  a  fictitious  ‘average’  ocean,  has  been 
compared  with  more  realistic  representations  of 
the  ocean  which  take  into  account  horizontal 
transport  as  well  as  upwelhng  of  deep  ocean  water 
in  the  equatorial  zone  and  downwelling  in  the 
temperate  and  polar  zones.  It  was  found  that  the 
box-diffusion  model  “gives  an  excellent  represen¬ 
tation  of  atmospheric  C02  and  14C02  interactions 
on  time  scales  up  to  several  tens  of  years”  and 
hence  near-term  effects  of  fossil  fuel  combustion 
on  global  climate  [48].  Expanding  the  temporal 
scale  (to  glacial  times)  and  the  number  of  chem¬ 
ical  variables  observed  required  a  considerably 
more  complex  (realistic?)  model.  ‘Pandora’s  Box’ 
(49]  *■ 


‘Another,  cogent  illustration  of  environmental  model  com¬ 
plexity  and  relevance  has  just  come  to  my  attention,  from  the 
field  of  ground  water  hjdrology.  As  with  the  several  imperfect 
vie* s  of  the  ocean  (and  the  classic  multiple  perspectives  of  the 
elephant),  the  particular  perspective  of  reality  embodied  in  the 
hydrological  model  (or  ‘cartoon’)  determined  its  predictive 
validity.  In  this  case,  a  construct  was  created  to  describe  the 
behavior  of  ground  water  in  fractured  rones,  and  it  was  para¬ 
meterized  with  the  most  accessible  observable,  the  fluctuating 
ground  .ter  level.  Once  calibrated,  the  model  did  well  at 
predicting  ground  water  levels;  but' when  a  new  need  arose, 
forecasting  transport  of  pollutants,  it  failed  completely  (EA. 
Prych,  personal  communication.  1990). 
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Thus,  enormous  advances  in  geochemical  sam¬ 
pling,  multivariable  chemical  measurement,  and 
computational  power  make  possible  model  refine¬ 
ments  that  approach  reality.  The  challenge  to  the 
chemist  and  statistician,  however,  is  to  define  just 
what  level  of  complexity  is  appropriate  —  i  e.,  to 
provide  guidance  as  to  the  nature  and  magnitude 
of  errors  m  measurements  and  in  models  that  are 
actually  relevant. 
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Abstract 


Thompson.  A  M  and  Stewart.  R  W ,  1991.  How  chemical  kinetics  uncertainties  affect  concentrations  computed  in  an  atmospheric 
photochemical  model.  Chemometncs  and  Intelligent' Laboratory  Systems,  10:  69-79. 

Tropospheric  photochemical  models  are  used  increasingly  as  predictive  tools  to  assess  the  chemical  response  of  the  lower 
atmosphere  to  changes  in  physical  and  chemical  conditions  which  influence  trace  species  distributions.  Among  the  many  uncertain¬ 
ties  in  the  modeling  process  are  imprecisions  m  reaction  rate  data  used  in  formulating  model  continuity  equations.  In  this  paper  we 
evaluate  the  propagation  of  these  kinetics  uncertainties  to  computed  species  distributions  in  a  photochemical  model. 

A  one-dimensional  kinetics-diffusion  model  having  72  reactions  among  24  species  is  used.  Non-chemical  sources  and  initial 
background  concentrations  arc  chosen  to  be  representative  of  clean  continental  mid-latitude  air.  Chemical  reaction  rate  data  are 
mostly  those  of  the  NASA  Kinetics  Evaluation  Panel  No.  8  (1987)  and  include  imprecisions  in  photolysis  rates  and  binary  and 
ternary  reactions  A  Monte  Carlo  technique  is  used  to  estimate  uncertainties  in  computed  concentrations  due  to  the  given  rate 
uncertainties. 

We  compute  uncertainties  in  odd  hydrogen  species  (the  radicals  OH  and  H02)  and  in  hydrogen  peroxide  ranging  from  22-41%. 
Uncertainties  for  Oj  and  CO  arc.  respectively,  17%  and  30%.  Odd  nitrogen  uncertainties  range  from  18%  for  NO  to  72%  for  NjOj, 
The  smallest  uncertainty  is  that  for  nitnc  acid  at  6%.  but  this  is  neglecting  uncertainties  in  physical  sources  and  sinks,  such  as 
precipitation  scavenging.  The  uncertainty  in  OH  (31%)  is  important  when  using  the  model  to  predict  tropospheric  oxidant  levels 
because  OH  determines  the  lifetime  of  numerous  naturally  and  anthropogenically  emitted  trace  gases. 


INTRODUCTION 

Onc-dinicnsional  photochemical  models  are 
used  to  simulate  vertical  profiles  of  trace  gas  dis¬ 
tributions  (Oj,  NO,,  CO,  OH,  H202)  in  the  atmo¬ 
sphere.  We  have  used  a  model  of  the  troposphere 
to  predict  changes  in  atmospheric  composition, 
primarily  levels  of  the  oxidants  0},  OH,  and  H202, 
as  emissions  of  NO,  CO,  and  CH.  change  over  the 
next  several  decades  (1-3).  We  also  use  the  mode! 
to  interpret  trace  gas  measurements  in  selected 


field  experiments,  calculating  ozone  production  in 
convective  situations  (4). 

In  both  predictive  and  interpretive  modes,  the 
photochemical  model  gives  results  that  are  uncer¬ 
tain  at  least  to  the  degree  that  ley  photochemical 
reaction  rates  are  uncertain  and  mechanistic  path¬ 
ways  for  some  reactions  arc  not  known  in  detail. 
We  have  evaluated  some  of  these  effects  and  re¬ 
port  on  an  investigation  of  uncertainties  in  calcu¬ 
lated  trace  gas  concentrations  due  to  the  impreci¬ 
sion  of  photochemical  reaction  rates. 
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A  Monte  Carlo  method  is  used  to  specify  sets 
of  photochemical  reaction  rates,  with  means  and 
uncertainties  given  from  a  standard  tabulation  of 
kinetics  and  absorption  spectra.  The  overall  uncer¬ 
tainty  or  likely  range  of  concentrations  for  a  given 
species  is  determined  by  hundreds  of  runs  in  which 
each  rate  coefficient  is  selected  randomly  and  a 
steady-state  solution  is  computed  for  all  species. 
The  Monte  Carlo  rate  kinetics  study  is  carried  out 
for  one  type  of  background  chemistry,  simulating 
a  northern  mid-latitude  continental  environment. 
Each  solution  describes  a  unique  atmospheric 
composition  and  when  these  are  averaged  to¬ 
gether,  the  mean  is  taken  as  representative  of  this 
type  of  chemical  regime.  In  a  related  study  (5)  we 
report  on  how  species  uncertainties  vary  with  mean 
composition  when  other  chemical  environments 
are  simulated. 

METHOD 

Photochemical  model 

A  one-dimensional  photochemical-kinetics 
modehsolves  the  continuity  equation  for  the  con¬ 


centration  of  the  ith  species,  c„  as  a  function  of 
time,  /: 

[*(*.  ')]  ') 

-£/(*»  ')  0) 
where  z  ==  altitude  (cm,  in  our  model);  K(z)  is  an 
eddy  diffusion  coefficient  (in  cm2  s’1,  assumed  to 
be  time-independent);  N(z)  is  molecular  density 
(cm’3);  Xi(z >  0  IS  mixing  ratio  or  mole  fraction 
of  species  i.  Pt(z9 1 )  and  L,{z,  t )  are  photochem¬ 
ical  production  and  loss  terms,  respectively,  for 
species  i.  Photochemical  reactions  making  up  pro¬ 
duction  and  loss  include  photodissociation  or 
thermal  dissociation  reactions,  in  which  the  species 
i  is  a  fragment  formed  by  a  unimolecular  process; 
bimolecular  reactions  between  two  free  radicals  or 
a  free  radical  and  a  nonradical  species;  and  three- 
body  processes  in  which  combination  of  two  radi¬ 
cals  in  concert  with  an  energetically  stabilizing 
third  body  leads  to  formation  of  a  nonradical 
molecule. 

Our  photochemical  model  spans  0-15  km  (the 
latter  taken  as  mean  height  of  the  tropopause) 


TABLE  1 


Trace  gases  and  boundary  conditions  in  photochemical  model 


Species 

Upper  boundary  (15  km) 

O, 

influx,  5  X  lO^cm"1  s’1 

<X’P) 

influx,  4  X  10* cm"2  s“* 

CHj,  CHjO,  CHjOj.  CHjOOH,  CjHjO^ 

H202.  CjH,OOH.  ch,co,.  h.  oh.  ho2 

photochemical  equilibrium 

NO,(NO  +  NO*  +  NO,  +  HNO,  +  HN04  +  2N20,) 

influx,  2.5  X  10lcm“2  $”* 

1I2C0,  PAN,  CHjCHO 

zero  flux 

CO 

1  tropospherc-to- 

Cjll, 

\  stratosphere  transfer 

Lower  boundary  (0  km) 

<v 

deposition 

CK'P) 

deposition 

CII,,  CHjO,  CH,02,  CjHjOj.  ch,co,.  h.  oh.  ho2 

photochemical  equilibrium 

NO 

flux 

NO, 

deposition 

NO,.  NA 

deposition 

PAN* 

deposition 

HjCO,  CHjOOH,  CHjCHO.  CjHjOOH  * 

deposition 

HjOj.HNOj.HNO** 

deposition 

C,H, 

fixed.  1.5  ppbv 

CO 

flux 

*  These  species  also  rained  out  with  first-order  removal  below  6  km. 
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TABLE  2 

Photolysis  reactions 

Model 

reaction 

Photodissociation 

%  Standard  dev., 
NASA/JPL  [8] 

%  Standard  dev  *, 

Monte  Carlo 

>, 

Oj  4-  hv**  02  +  O 

10 

9.7 

h 

Oj-fin-Oj  +  Of’D) 

40 

40 

h 

N02  +  hy~  NO  4-  O 

30 

30 

HNOj  +  Af-OH  +  NOj 

30 

29 

h 

H202  +  A»— OH  +  OH 

40 

43 

h 

NO,  +  hv  -  NO  +  Oj 

100 

83 

J7 

NOj  N02  +  O 

100 

81 

h 

H2CO  +  Ar~H+HCO 

40 

38 

y, 

CH}OOH  4-  hr- OH  4-  CHjO 

40 

37 

HN04  +  A»'-H02  +  N02 

100 

89 

hi 

CHjCHO  4  hv  -CHj  +  HCO 

40  ** 

37 

hi 

N2Oj  +  hv~  N02  +  NOj 

100 

82 

hi 

H2c6+Ar*»Hj  +  CO 

40 

35 

h. 

CjHjOOH  +  hv  -  CjHjO  +  OH 

40*** 

37 

hi 

PAN  +  hr~  CHjCOj 4* N02 

30* 

27 

*  From  800  model  runs. 

*'  Specified  uncertainty  assumed  in  analogy  with  HrCO. 

***  Specified  uncertainty  assumed  in  analogy  with  CHjOOH. 
6  Specified  uncertainly  assumed  in  analogy  with  HN’Oj. 


with  24  grid  points  [1,6].  Spacing  is  at  1-km  inter¬ 
vals  between  1  and  15  km  and  on  a  refined  grid 
below  1  km  to  give  better  simulations  of  gradients 
in  the  boundary  layer.  Several  types  of  boundary 
conditions  arc  specified,  depending  on  the  species: 
photochemical  equilibrium,  flux,  fixed  mixing 
ratio,  or  removal  at  surface  or  tropopausc  with  a 
specified  transfer  velocity.  We  calculate  vertical 
profiles  of  24  trace  species,  a  standard  comple¬ 
ment  of  odd  oxygen  (Oj,  0(3P));  odd  hydrogen 
(H,  OH,  HOj),  odd  nitrogen  (NO,  N02,  N03, 
Nj03,  HNOj,  HN04-H02N0j),  hydrocarbons 
denved  from  oxidation  of  CH«  (CH3,  CH3Oj, 
HjCO,  CHjOOH,  CO)  and  C2Iis  and  its  oxida¬ 
tion  products,  including  peroxy  acetyl  nitrate 
(CjHjOj,  CjHjOOH,  CHjCHO,  CHjCOj,  PAN). 
A  list  of  species  and  boundary  conditions  is  given 
in  Table  1.  The  set  of  chemical  reactions  used  in 
the  model  appears  in  Tables  2  and  3. 

Eq.  (1)  is  solved  by  finite  differencing  after 
converting  to  a  set  of  nonlinear  algebraic  expres¬ 
sions  of  form: 

717  “A*- ') 

X-x!.  Xj.  Xj . X«.  Xi.  xl . xl, . Xf'. 

xV.xV . X?'  (2) 


where  x!“  <th  species  mixing  ratio  at  altitude  grid 
point  j;  /«*  forcing  function  which  is  a  sum  of 
flux  divergence,  and  rates  of  chemical  reaction; 
ns  “  total  number  of  chemical  species;  np  -  total 
number  of  spatial  grid  points.  The  mixing  ratios 
are  obtained  from  integration  of  (2). 

In  performing  sensitivity  calculations,  as  for 
example  in  simulating  perturbed  emissions  or 
varying  reaction  rate  coefficients,  a  steady-state 
version  of  the  model  is  used.  This  means  simulta¬ 
neous  solution  of  eqs.  (2)  where  dx/dr  ■  0  and 
diurnally  averaged  reaction  rates  and  species  con¬ 
centrations  are  computed  according  to  the  method 
of  Turco  and  Whitten  [7].  Diurnally  averaged  rate 
coefficients  and  photolysis  rates  arc  used  in  the 
steady-state  version  and  the  desired  means  are 
approximated: 

W/-  (DF  )ijk,jXiT/  (3) 

The  reaction  or  loss  term  in  eq.  3  is  the  product  of 
diurnally  averaged  species  mixing  ratios  x,  and  x^ 
and  the  diurnally  averaged  rate  coefficient  is 

*<;-(  DF)(/fcf/  (4) 

where  (DF),,  is  a  diurnal  averaging  factor  and  kt/ 
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TABLE 3 


Photochemical  reactions,  rates,  and  uncertainties 


Model 

reaction 

number 

Bimolecular 

reaction 

Rate  and  uncertainty  factors 
/I -Factor  E/R±AE/R 

1 

/(29S) 

i 

2 

O+O, -20, 

80X10"“ 

2060  ±250 

1.15 

4 

0(,D)  +  N2-0  +  N, 

18X10"" 

-(HO  ±100) 

12 

5 

0(iD)  +  02-0+0, 

32X10"“ 

-(70  ±100) 

12 

6 

no+o,-*no2-fo2 

20X10"“ 

1400  ±200 

1.2 

7 

NOjj  •+  0  •**  NO  +  02 

65X10"“ 

-(120  ±120) 

1 1 

8 

N02  +  0,-*NO,+02 

1.4X10"” 

2500  ±  140 

1.15 

9 

NO  +  NOj  “*  N02  +  N02 

1.7  X  10"" 

-(150  ±100) 

1.3 

12 

N2(y+N2-N02  +  N02  +  N2 

5.7X10“ 

10600 

13 

O('D)  +  H2O-OH  +  0H 

2  2X10"“ 

0±100 

12 

14  0(,D)  +  CH1-0H  +  CH,  1.4X10'“  0±100  12 

15  (X'Dl+CH.-Hj  +  H.CO  1.4X10""  0±  100  12 

16  0('D)  +  H,-0H  +  H  10X10"“  0±100  1.2 


17 

H  +  0)  «•  OH  +  02 

1.4X10"“ 

470  ±200 

125 

19 

0H+0,-H02  +  02 

1.6X10"“ 

940  ±300 

13 

20 

H02  +  0,«0H+202 

1.1X10"“ 

500*500/-  100 

13 

21 

oh+o-*h+o2 

22X10"" 

-(120  ±  100) 

12 

22 

ho2 + o  ~  oh + o2 

30X10"" 

-(200  ±100) 

1.2 

23 

H2O2  +  O~OH  +  H02 

1.4X10"“ 

2000  ±  1000 

20 

24 

oh+ch4-*ch,+h2o 

2.3  X  10"“ 

1700  ±200 

1.2 

25 

ho2  +  no-oh+no2 

3.7X10"“ 

-(240  ±80) 

1.2 

26 

oh  +  co-*co2+h 

1.5  X  10“'5  X(1  +  06p)  0±300 

1.3 

27 

0H  +  H2-H20  +  H 

5.5X10"“ 

2000  ±400 

1.2 

29 

oh+hno,-h2o+no, 

** 

13 

30 

OH  +  H202  -*  HjO  +  1102 

3.3  X  10"“ 

200+  100/-  300 

1.3 

31 

OH  +  HO;  -*  H20  +  02 

46X10"" 

-(230  ±200) 

1.3 

32 

0H  +  0H-*H20  +  0 

4.2  X 10"“ 

240  ±240 

1.4 

33 

OH  +  H2CO-*  H20  +  HCO 

1.0  X  10"" 

0±200 

1.25 

34 

ho2  +  ho2-*h2o2+o2 

2.3X10“” 

-(600  ±  200) 

1.3 

36 

h+ho2~h2+o2 

7.3X10"“ 

0±200 

1.3 

37 

H  +  HOj-HiO  +  O 

32X10"“ 

0±200 

1.3 

38 

h  +  ho2-oh+oh 

70X10"” 

0±200 

1.3 

39 

H2CO  +  0~OH  +  HCO 

3.4X10"“ 

+  1600  ±250 

125 

41 

01,0;  +  NO  -  CH>0  +  N02 

4  2X10"“ 

-(180  ±180) 

1.2 

42 

CH  >02  +  H02  -  CH  >0011  +  02 

1.7X10"” 

-(1000  ±500) 

1.3 

43 

CHjOOH  *  OH  -  CH,02  +  H*0 

1.0X10"” 

0±  200 

20 

44 

CH,0  +  02-H2C0+H02 

3  9X10"“ 

900  ±300 

1.5 

45 

Hj+  O-OH  +  H 

88X10"“ 

4200 

47  HNO.  +  M-IIO,  +  NO,  +  M  1.0X10“  10350 

48  UNO, +  011-11,0  +  0,  + NO,  1.3X10"“  -(3S0  +  270/-500)  1.5 

40  HC0  +  0,-C0+H0,  3.5X10-“  -(140  ±140)  1.3 

50  C,H,+  OH  —  C,H,  + 1120  1.1X10"”  1100±  200  1.2 

51  C,H,0,  +  NO  —  C,1I,0  +  NO,  4  2X10"“  -(180  ±180)  1.2 

52  C,II,0+0,-CII,CI10+HO,  1.2X10"”  1350  ±  300  1,5 

53  C,II,0,  +  H0,-C,ll,00H  +  0,  6.5  X 10““  -<650  ±200>  1.3 

54  011,010  + OH -CK, CO, +  llj0  60X10"“  -(250  ±  200)  1.4 

55  011,00,  +  NO- Oil,  +  CO,  +  NO,  2.4X10’“ 

57  PAN  +  M-  01,00,  +  NO,  +  M  6.3X10"’ 
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TABLE  3  (continued) 

Model 

Reaction 

Number 

Three-body  reaction 

Rate  * 

>  x» 

*0 

n 

m 

1 

0+02  +  M-0j  +  M 

(60±0.$)Xl0-» 

23  ±05 

3 

0  +  0+M-02  +  M 

427x10"“ 

10 

no+o+m-no2+m 

(90i20)xl0"M 

hStOJ 

(30±10)X10'" 

0  ±1 

11 

N02  +  NOj  +  M  -  NjOj  +  M 

<22±0S)X10'K 

4  3±  1,3 

(l.S±08)xl0-,: 

0.5  ±  0  5 

18 

H  +  02  +  M~H02*M 

(5.7st03)Xl0-)! 

16±05 

<7.S±40)X10"11 

0  ±1 

28 

OH  +  N02  +  M  -  UNO,  +  M 

(26±0.3)X10'M 

3  2  ±0.7 

(24±12)X10~" 

1.3  ±1.3 

35 

0H  +  0H+M-H202  +  M 

(69±3O)XI0')l 

0.8 +  20/- 08 

<1.0±0S)X10-" 

10±10 

40 

CHj  +  02  +  M  «** CH/)2  +  M 

(4.5±15)XI0'!1 

20±  1.0 

(1S±02)X10',! 

1.7  ±1.7 

46 

H02  *  N02  +  M  -  HN04  +  M 

(t  8  ±  0.3)  X 10-31 

3.2±04 

(47±1.0)X10'n 

1.4  ±14 

56 

CHjCOj + N02 +M- PAN  4-  M 

4X10'”*** 

*  4<J)“  rn^^ip*7fy06',*ll'’>'t<‘’<r)IMl,i"(r,>1,r'’  ^(J’WfV/JOO >-*  and  X„<r)«*;i°°<r/300>-" 


**  Expression  for  (his  reaction  is  sum  of  three  terms  given  in  ref.  8. 
***  Use  overall  /(i9S)-U. 


is  a  bimolccular  rate  coefficient  between  species  i 
and  j.  The  factors  are  determined  from  eq.  3  by 
running  the  time-dependent  model  to  equilibrium, 
i  e.  to  periodic  24-hour  behavior,  and  evaluating 
all  the  averages  in  (3).  All  the  species  concentra¬ 
tions  illustrated  in  this  study  arc  diurnally  aver¬ 
aged  mixing  ratios,  x,. 

Looking  at  cq.  3  it  is  clear  that  the  diurnal 
factor  (Dp),,  depends  on  equilibrium  concentra¬ 
tions  of  species,  i.c.  composition,  and  that  as  the 
calculated  equilibrium  composition  changes  in  re¬ 
sponse  to  a  different  set  of  rate  coefficients,  the 
factors  also  change.  Thus,  in  performing  the  Monte 
Carlo  study,  a  time-dependent  run  must  be  carried 
out  to  obtain  factors  self-consistent  with  the  diur¬ 
nally  averaged  x,  from  steady-state  calculation. 
The  initial  ‘perturbed'  set  of  rates  coefficients  is 
always  run  with  the  time-dependent  model  and 
the  diurnally  averaged  rates  arc  supplied  to  the 
steady-state  model  for  final  calculation  of  the 
diurnally-averaged  (or  steady-state)  concentra¬ 
tions. 

The  expression  ‘unperturbed'  chemistry  refers 
to  the  atmospheric  composition  as  simulated  by 
the  model  with  the  standard  set  of  72  reaction  rate 
coefficients  at  mean  values  (Tables  2  and  3). 
Atmospheric  measurements  are  used  in  specifica¬ 


tion  of  mixing  ratios  or  flux  values  for  NO  and 
CO  and  for  0,  deposition  velocity.  The  ‘unper¬ 
turbed’  chemical  profiles  simulate  ‘Clean  Con¬ 
tinental’  northern  mid-latitude  regions:  Oj-44 
ppbv,  CO  ~  135  ppbv,  NO,  -  0.20  ppbv,  with 
GH4  -  1.70  ppmv  at  the  surface.  Vertical  profiles 
of  0,,  CO,  NO,,  and  UNO;  appear  in  Fig.  1. 


Clean  Continental  (45  N) 


Fig.  1.  Vertical  profiles  of  Oj,  CO,  IlNOj.  and  NO,  typical  of 
the  relatively  clean  continental  mid-latitude  troposphere.  Con¬ 
centrations  arc  given  in  mixing  ratio  by  volume  (mole  fraction). 
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Monte  Carlo  calculations 

Method 

We  vary  ihe  72-reaction  set  of  rate  coefficients 
for  each  model  run  as  follows  A  given  set  of 
perturbed  rate  coefficients  is  generated  from  a 
random  number  generator  and  each  perturbed  run 
is  made  with  a  different  set  of  72  reactions.  The 
set  of  reaction  rates  is  based  on  uncertainties  in 
chemical  rates  as  described  in  the  JPL/NASA 
Panel  8  Evaluation  (8j  to  derive  corresponding 
uncertainties  in  the  species  concentrations. 

The  perturbed  reactions  are  used  in  the  time- 
dependent  version  of  the  model,  which  is  in¬ 
tegrated  for  two  days  to  produce  diurnally  aver¬ 
aged  rates  (4)  and  mixing  ratios.  These  mixing 
ratios  are  not  ‘converged’  to  equilibrium  in  that 
the  24-hour  cycle  of  each  species  is  not  periodic.  It 
would  take  many  days  of  integration  to  achieve 
this  because  several  constituents  (e  g.  Oj,  CO,  and 
PAN)  have  photochemical  lifetimes  over  a  week. 
This  is  not  computationally  practical  because  each 
day  of  integration  takes  several  minutes  on  the 
VAX  11/780  and  attached  processor. 

We  have  compared  diurnally  averaged  rates 
computed  after  two  and  ten  day  time  dependent 
model  runs.  The  maximum  difference  as  a  per¬ 
centage  of  the  imprecision  occurs  for  the  photoly¬ 
sis  of  NjO}  (rate  Ju  in  Tabic  2)  and  is  1.6%.  Only 
one  other  rate  (rate  9  in  Table  3)  has  a  percentage 
difference  as  great  as  1%.  We  do  not  expect  the 
variances  in  species  concentration  computed  over 
a  set  of  model  runs  to  be  sensitive  to  small  errors 
in  averaged  rates  for  each  individual  model,  and 
this  approximate  averaging  should  be  adequate. 

Assignment  of  rate  coefficient  uncertainties 

Most  of  the  72  reactions  used  in  the  photo¬ 
chemical  model  have  an  associated  uncertainty 
given  by  the  NASA  panel  evaluation  (8).  As  noted 
in  this  report,  the  assigned  uncertainties  are  sub¬ 
jective  judgments  of  the  panel  and  are  not  based 
on  rigorous  statistical  analysis  because  there  have 
been  an  insufficient  number  of  laboratory  investi¬ 
gations. 

We  have  assumed  that  the  uncertain  parame¬ 
ters  entering  into  reaction  rate  calculations  have 
simple  probability  density  functions,  Gaussian  or 


lognormal,  depending  on  whether  the  parameter  is 
intrinsically  positive  or  not.  At  the  beginning  of  a 
model  run,  values  are  selected  from  these  distribu¬ 
tions  for  each  parameter  entering  into  the  reaction 
rate.  Each  run  gives  different  values  for  the  con¬ 
centrations  corresponding  to  ihe  randomly  selected 
rates  for  that  run.  After  a  sufficiently  large  num¬ 
ber  of  trials  (runs),  histograms  showing  the  per¬ 
centage  deviation  of  each  species  concentration 
from  its  mean  over  all  runs  are  obtained  numeri¬ 
cally  for  each  species.  We  show  results  after  800 
runs.  The  computed  means  and  variances  for  each 
species  are  nearly  constant  as  runs  are  added  at 
this  point.  The  maximum  difference  in  the  ratio  of 
the  standard  deviation  in  the  mean  from  the  700 
run  results  is  for  C;H,OOH  which  changed  by 
1.6%  after  800  runs.  Similar  calculations  for  stra¬ 
tospheric  chemistry  carried  out  by  Stolarski  and 
coworkers  show  that  convergence  to  1-2%  is  ob¬ 
tained  after  -  1000  runs  (9,10). 

Most  of  the  reactions  used  in  the  photochem¬ 
ical  model  fall  into  one  of  three  categories,  pho¬ 
tolysis,  bimolecular,  and  termolccular,  as  noted  in 
the  discussion  following  eq.  (1).  The  uncertainties 
in  reaction  rates  are  stated  differently  for  each 
category  in  ref.  8  which  requires  some  difference 
in  the  treatment  for  each  one. 

Uncertainties  in  photolysis  rates  used  in  the 
calculations  arc  given  as  an  overall  fractional  un¬ 
certainty  in  the  rate,  rather  than  as  measurement 
uncertainties  in  the  various  fluxes,  cross  sections, 
and  quantum  yields  which  determine  these  rates 
(8).  The  photodissociation  reactions  are  given  in 
Table  2.  We  have  assumed  a  lognormal  distribu¬ 
tion  for  the  photolysis  rates  with  a  standard  devia¬ 
tion  corresponding  to  the  stated  fractional  uncer¬ 
tainly  for  each. 

Most  second  order  rates  arc  obtained  from  the 
product  of  a  rate  coefficient  and  an  exponential 
factor  containing  the  activation  energy.  The  gen¬ 
eral  expression  for  binary  rates  is 

k(T)-Aexp(-E/RT)  (5) 

where  k(T)  is  the  overall  reaction  rate  is  the  rate 
coefficient  multiplying  the  exponential  factor,  E 
is  (he  activation  energy,  R  the  gas  constant,  and  T 
the  temperature.  JPL/NASA  (8)  give  uncertainties 
in  activation  energy,  EE,  as  well  as  an  unccr- 
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tainty,  / (298),  in  the  overall  rate  at  298  K.  The 
overall  uncertainty  at  other  temperatures  is  calcu¬ 
lated  from  the  expression 
f(T)  -/( 298)  exp  |  CsE/R(\/T- 1/298)  |  (6) 

For  purposes  of  generating  perturbed  binary  rates 
for  a  Monte  Carlo  series  of  model  runs  we  assume 
that  the  overall  uncertainty  given  in  the 
JPL/NASA  Panel  8  Tabulation  is  given  by  an 
uncertainty  in  the  rate  coefficient  A  in  eq.  5  with 
A  being  lognormally  distributed.  The  temperature 
dependent  factor  in  eq.  5  is  always  evaluated  at 
the  standard  value  of  the  activation  energy.  The 
JPL/NASA  (8]  convention  is  followed  in  Table  3, 
which  means  that  / (298)  “  1.2  signifies  a  1  - 
sigma  uncertainty  of  20%.  Note  that  the  column 
labeled  /( 298)  in  Table  3  is  the  overall  uncertainty 
and  is  not  necessarily  identical  to  that  which  would 
be  computed  using  the  stated  uncertainty  in 
activation  energy.  Temperatures  m  the  1-dimen- 
sional  model  decrease  with  altitude  and  we  have 
chosen  to  evaluate  the  binary  rate  uncertainties  in 
eq.  6  at  the  surface  temperature  of  288  K.  This  is  a 
conservative  assumption  in  that  it  gives  smaller 
rate  uncertainties  in  model  mixing  ratios,  but  it  is 
reasonably  good  for  evaluating  uncertainties  in  the 
boundary  layer  in  which  we  are  primarily  inter¬ 
ested. 

The  general  expression  used  to  evaluate  termo- 
lccular  rates  is  more  complicated  (Table  3).  The 
general  form  of  a  lermolccular  reaction  is  A  +  B 
+  M  -»  AB  +  M  where  M  is  a  quenching  third 
body.  Low  pressure,  k0,  and  high  pressure. 
limiting  rates  are  given  in  the  form 

k„(T)-k^(T/3W>y". 

kjT)-k™(T/2W)-m  (7) 

and  these  are  combined  in  a  rate  expression  appli¬ 
cable  to  general  conditions  of  atmospheric  tem¬ 
perature  and  pressure  by 

,  ,  ,  *o(r)[M] 

{)  i+*0(r)[M|/ur) 

x0.6(,+,^*®<r*MyMr#r11  (g) 

The  factor  (M)  in  eq.  (8)  is  the  concentration  of 
third  bodies  involved  in  the  lermolccular  reac¬ 
tions,  specified  by  the  model  as  the  sum  of  O*  + 


N2.  Uncertainties  are  given  for  the  coefficients 
&o°°,  and  k^°  and  for  the  exponents  n  and  m  in 
the  temperature  dependent  factors.  Since  k J00  and 
k%?  must  be  positive  they  are  assumed  lognorm¬ 
ally  distributed,  but  the  exponents  n  and  m  may 
be  assumed  normally  distributed.  The  overall  rate 
is  thus  a  function  of  four  random  variables  and 
the  nature  of  its  distribution  does  not  follow  im¬ 
mediately  from  the  assumptions,  as  does  that  of 
the  binary  rate,  though  it  is  clearly  always  posi¬ 
tive. 


RESULTS  AND  DISCUSSION 
Reaction  rate  uncertainties 

Variability  in  some  of  the  reaction  rates  im¬ 
portant  in  the  odd  hydrogen  balance  of  the  tropo¬ 
sphere  is  shown  in  Fig.  2.  Fig.  2a  shows  the 
distribution  in  the  rate  of  photolysis  of  ozone  to 
produce  0(!D)  which  initiates  most  tropospheric 
photochemistry: 

O,  -hr(\-  295-310  nm)  -  O('D)  +  02 

The  staled  uncertainty  in  this  rate  is  40%  and  a 
lognormal  distribution  is  assumed  for  photolysis 
reactions.  Here  apparent  lognormality  and  an  un¬ 
certainty  close  to  the  one  given  in  JPL/NASA  (8) 
are  recovered  from  the  numerical  results.  Fig.  2b 
shows  the  distribution  of  the  0('D)+  H20  reac¬ 
tion  which  is  the  primary  source  of  tropospheric 
OH. 

Fig.  2c  shows  the  distribution  of  the  tcrmolecu- 
lar  rate  for  the  reaction  OH  +  OH  +  M  —  H202  + 
M  forming  hydrogen  peroxide.  Although  the  dis¬ 
tribution  appears  somewhat  skewed  towards  posi¬ 
tive  values  we  cannot  characterize  it  as  lognormal 
since,  as  noted  previously,  it  results  from  a  rela¬ 
tively  complicated  relationship  among  four  ran¬ 
dom  variables.  Indeed,  the  lermolccular  distribu¬ 
tions  we  have  examined  appear  to  be  more  sym¬ 
metric  about  their  means  than  would  be  the  case  if 
strictly  lognormal. 

Computed  constituent  uncertainties 

Fig.  3  shows  the  calculated  variability  in  odd 
hydrogen  species,  OH  and  H02,  and  in  hydrogen 
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Fig.  2.  Histograms  showing  distributions  of  several  reactions 
calculated  at  0  km  (288  K)  affecting  the  concentration  of  OH. 
Statistics  are  based  on  800  model  runs,  (a)  Oj  UV  photolysis  in 
s**1:  b)  the  major  formation  reaction  for  OH  with  mean  rate 
2,2X10  ^cm*  s~‘;  (c)  ternary  rate  for  OH  and  OH  recombi¬ 
nation  with  mean  rate««1.6xl0*,,cm<  s"1. 


Fig.  3,  Histograms  of  odd  hydrogen  species  distributions  at  0 
km  after  800  model  runs. 
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peroxide,  H:():.  All  deviations  greater  than  100% 
above  the  mean  are  placed  in  the  rightmost  verti¬ 
cal  bar  on  these  histograms  plots.  We  expect  the 
variance  of  OH  and  HO.  to  be  relatively  large 
since  they  participate  in  more  reactions  than  any 
other  species.  Hydrogen  peroxide  is  readily  ab¬ 
sorbed  in  cloud  droplets  and  may  be  an  important 
component  in  the  liquid  phase  production  of 
sulfate  and  consequent  decrease  in  droplet  pH 
(11,12).  We  note  that  wet  removal  of  H202  is 
included  in  our  model  continuity  equations  for 
H202  as  a  first  order  rate  coefficient  but  this  rate 
is  not  varied.  We  have  previously  explored  the 
possibility  of  increases  in  future  peroxide  levels 
resulting  from  projected  changes  in  methane  and 
CO  emissions  and  from  possible  climate  changes 
(3,13).  We  estimate  global  change  for  H202  re¬ 
sponding  to  continuing  0.5-1%/yr  CO  and  CH4 
increases  to  be  about  20%  over  the  next  fifty  years 
(13).  The  present  study  would  imply  that  this 
change  is  smaller  than  the  model’s  precision  for 
computing  H202  under  a  given  set  of  conditions. 
Fortunately,  we  can  make  atmospheric  measure¬ 
ments  of  key  species  (02,  CO)  to  better  precision 
than  we  compute  in  the  Monte  Carlo  study  and 
this  suggests  that  we  can  improve  on  the  calcu¬ 
lated  uncertainties  for  all  species  by  constraining 
the  model  with  observations  (5). 

Fig.  4  shows  the  calculated  variability  in  mem¬ 
bers  of  the  odd  nitrogen  family,  nitric  oxide  (NO), 
nitrogen  dioxide  (N02),  and  nitric  acid  (HN03). 
The  uncertainty  in  HN03  is  one  of  the  smallest 
occurring  in  our  calculations.  As  for  H202,  uncer¬ 
tainties  in  HNOj  due  to  rainout  are  not  included 
in  this  study.  These  are  likely  to  substantially 
increase  the  HN02  variance  (14). 

Fig.  5  shows  the  calculated  variability  of  ozone 
(03)  and  carbon  monoxide  (CO).  These  species 
are  less  reactive  than  free  radicals,  peroxides  or 
acids.  We  expect  smaller  variability  for  03  due  to 
rate  uncertainties  because  external  sources  of  03 
as  well  as  chemical  reactions,  are  important  to  its 
atmospheric  distribution.  A  fixed  flux  into  the 
troposphere  is  assumed  for  ozone  at  the 
tropopause.  The  uncertainty  at  the  surface  is  17%. 
The  uncertainty  for  CO  is  higher,  -  31%,  even 
though  an  upflux  of  CO  is  very  important  to 
boundary  layer  CO.  The  reason  is  a  fractional 


<c) , 


•TOO  -SO  V  SO  CO 

OtYutcn  From  ConctflNt«n  (X) 

Fig.  4.  Histograms  of  odd  nitrogen  species  distributions  at  0 
km  after  800  model  runs. 
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Ozone 


Carbon  Monoxide 


(b) 

Fig.  5.  Vertical  profiles  (0-12  km)  based  on  800  simulations 
for  (a)  Oj  and  (b)  CO.  The  mean  is  indicated  with  the  central 
vertical  profile,  shading  indicates  1  -  sigma  deviation  from  the 
mean. 


uncertainty  in  the  rate  of  the  reaction,  OH  +  CO, 
(30%),  which  is  the  major  CO  sink.  When  model 
runs  are  performed  without  varying  this  reaction 
rate,  the  variation  in  CO  is  less  than  20%. 


SUMMARY 

Estimated  imprecisons  in  chemical  reactions 
rates  important  in  tropospheric  photochemistry 
have  been  used  to  estimate  the  resulting  uncer¬ 
tainty  in  model  calculated  trace  species  distribu¬ 
tions.  A  Monte  Carlo  approach  is  used  with  tabu¬ 
lated  kinetics  imprecisions  specified  for  72  reac¬ 
tions.  The  tabulated  imprecisions  are  reproduced 
closely  by  the  model  after  several  hundred  model 
runs  and  the  propagated  uncertainty  in  24  trace 
constituents  is  calculated.  Uncertainties  for  ozone 
and  carbon  monoxide  are  17%  and  31%,  respec¬ 
tively.  For  CO  this  is  2-3  times  greater  than  the 
imprecision  which  typically  affects  CO  measure¬ 
ments  in  the  atmosphere. 

Odd  nitrogen  uncertainties  are  ~  20%  for  NO 
and  N02  and  only  6%  for  HN03  because  impreci¬ 
sion  in  precipitation  scavenging,  an  important  loss 
for  nitric  acid,  has  not  been  included  in  the  study. 
Hydroxyl  radical  (OH)  has  a  computed  uncer¬ 
tainty  of  31%,  which  somewhat  limits  the  model 
assessment  capability  for  precise  evaluation  of 
oxidant  changes. 

In  a  related  study  (5)  we  report  on  correlation 
analysis  between  rates  and  species  to  identify  those 
reactions  which  contribute  most  to  the  variance  of 
selected  species.  This  also  helps  in  developing 
m-situ  measurement  strategies  (o  reduce  the  over¬ 
all  computational  variance  found  in  the  present 
study  and  in  identifying  the  photochemical 
processes  at  which  further  laboratory  investigation 
might  be  most  effectively  directed. 
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Abstract 


Jurs.  P.C.  and  Uwson.  R.G .  1991.  Analysis  of  chemical  structure-biological  activity  rclationsh’ps  using  clustering  methods. 
Chemometnes  and  Intelligent  Laboratory  Systems,  10.  81-83. 

The  importance  of  calculating  clustering  tendency  of  a  data  set  as  part  of  a  complete  methodology  is  described.  A  new  method  for 
evaluating  the  clustering  tendency  is  illustrated  with  artificially  clustered,  random,  and  actual  chemical  data  sets.  This  new  index  is 
shown  to  be  more  useful  than  the  original  one. 


Cluster  analysis  is  a  useful,  and  increasingly 
popular  method  for  exploring  data  represented  in 
high-dimensional  spaces.  Questions  that  car.,  be 
approached  using  cluster  analysis  arise  in  phar¬ 
maceutical  and  agricultural  chemistry  in  the  con¬ 
text  of  structure-activity  rclatk  nships.  For  exam¬ 
ple,  a  common  exploratory  approach  to  SAR  is  to 
retrieve  those  compounds  which  have  a  particular 
structural  fragment  from  a  large  data  base  of 
compounds.  Then  it  is  of  interest  to  seek  subsets 
of  compounds  with  structural  similarities,  that  is, 
clusters.  Other  examples  come  from  toxicology, 
where  it  is  of  interest  to  examine  sets  of  com¬ 
pounds  for  structural  similarities  so  that  these 
similarities  can  be  related  to  toxicity.  A-  third 
example  involves  the  examination  of  a  number^of 
possible  conformations  for  complex  structures  as 


provided  by  a  molecular  mechanics  routine  to  see 
if  they  fail  in  natural  subgroupings. 

The  exploration  of  multivariate  data  via  cluster¬ 
ing  involves  many  steps:  data  collection,  initial 
screening  of  the  variables,  exploration  of  cluster¬ 
ing  tendency,  application  of  clustering  strategies, 
and  validation  and  interpretation  of  the  results. 
Often  the  entire  process  is  iterative.  Once  a  data 
set  has  been  selected  for  analysis,  the  examination 
of  clustering  tendency  prior  to  the  development  of 
clusters  is  important  because  it  allows  the  experi¬ 
menter  to  be  sure  that  the  clustering  exercise  has  a 
chance  of  finding  real  clusters.  Most  algorithms 
designed  to  find  clusters  will  find  some  regardless 
9'/the  structure  of  the  data.  This  work  focusses  on 
the  evaluation  of  clustering  tendency  via  Hopkins 
statistic  and  a  recently  proposed  variation  of  it. 
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Hopkins  statistic  has  been  shown  previously  to  be 
a  very  good  method  for  assessing  clustenng  ten¬ 
dency  (1}. 

Hopkins  statistic  (2,3)  is  intended  to  assess 
whether  or  not  a  given  data  set  differs  from  a  set 
of  uniform  random  numbers.  The  statistic  is 
calculated  with  the  following  equation. 

£  Vf  Ut‘<  random  to  real 

"  E  U, +  £  Wt  real  to  real 

Each  Ut  value  in  the  distance  from  a  randomly 
selected  position  within  the  sampling  window  to 
the  nearest  data  point,  and  each  \Vt  value  is  the 
distance  from  a  randomly  selected  data  point  to 
its  nearest  neighbor  data  point.  The  sums  are  over 
the  number  of  sampling  points,  which  is  usually 
selected  to  be  5%  to  10%  of  the  number  of  points 
in  the  data  set.  The  Ut  positions  (the  sampling 
points)  are  chosen  from  a  uniform  distribution 
within  the  sampling  window.  //  has  values  near 
1/2  for  unclustcred  data,  that  is,  data  with  a 
uniform  distribution.  II  has  values  greater  than 
1/2  for  clustered  data,  and  1.0  is  the  upper  limit 
for  extremely  clustered  data.  For  reasonable  as¬ 
sumptions,  II  has  a  beta  distribution,  so  the  prob¬ 
ability  for  rejection  of  the  null  hypothesis  (no 
clustering)  can  be  quantitatively  stated.  For  exam¬ 
ple,  for  15  sampling  points  and  a  value  II  -  0.65, 
the  probability  of  rejection  of  the  null  hypothesis 
is  0.90. 

The  ordinary  Hopkins  statistic  has  several 
shortcomings.  One  is  its  sensitivity  to  the  size  of 
the  sampling  window  and  hence  to  outliers. 
Another  is  that  the  criterion  of  comparison  to  a 
uniform  distribution  is  weak  since  almost  any 
measured  or  calculated  data  will  be  more  clustered 
than  the  uniform  distribution. 

We  have  investigated  (4)  a  modified  form  of  the 
Hopkins  statistic,  //',  designed  to  overcome  these 
shortcomings.  Instead  of  choosing  the  sampling 
points  from  a  uniform  distribution,  we  choose 
them  from  the  actual  ^univariate  distributions  of 
the  data  under  investigation.  This  allows  us  to 
investigate  whether  the  clustering  tendency  ,  ob¬ 
served  for  the  data  set  is  due  to  the  multivariate 
nature  of  the  observations  or  due  only  to  the 
univariate  distributions  of  the  variables. 


Tests  of  this  modified  Hopkins  statistic  with 
two-dimensional  and'  ten-dimensional  artificial 
data  sets  designed  to  be  extremely  clustered,  and 
with  an  eight-dimensional  chemical  data  set,  show 
the  modified  statistic  to  be  more  conservative  in 
its  estimation  of  clustenng  than  the  original 
Hopkins  statistic.  The  modified  statistic  also  is  not 
sensitive  to  outliers. 

The  chemical  example  used  for  testing  the  mod¬ 
ified  Hopkins  statistic  consists  of  143  acrylate 
compounds  with  the  general  structure  shown.  This 
set  of  data  was  analyzed  m  the  context  of  a 
structure-toxicity  relationship  investigation  [5]. 
Each  of  the  143  acrylates  was  represented  by  a  set 
of  eight  calculated  structural  descriptors  which 
were  chosen  to  best  represent  the  structures.  A 
principal  components  plot  of  the  data  shows  no 
apparent  clustering.  However,  the  data  do  show 
substantial  clustering  tendency  with  the  original 
Hopkins  Statistic:  II  -  0.82.  When  the  original 
Hopkins  statistic  was  calculated  for  scrambled 
data,  H  -  0.77.  This  shows  that  there  is  substan¬ 
tial  clustering  tendency  due  to  the  univariate  dis¬ 
tribution  of  the  eight  structural  descriptors.  The 
value  fpr  the  modified  Hopkins  statistic  was  H' « 
0.65.  This  shows  that  the  multivariate  data  contain 
more  information  than  merely,  their  univariate  dis¬ 
tributions.  This  data  set  was  analyzed  for  cluster¬ 
ing  using  the  well-known  /T-means  and  Isodata 
clustering  method,  and  five  stable  clusters  were 
found.  These  five  clusters  made  good  sense  when 
the  structures  of  the  compounds  in  each  class  were 
considered  by  knowledgeable  chemists  and 
toxicologists. 

O 


The  modified  Hopkins  statistic  can  also  be 
used  for  feature  selection,  that  is,  for  selection  of 
these  variables  which  support  clustenng  in  a  data 
set.  Preliminary  studies  have  shown  that  the  use  of 
partial  sums  of  Vt  and  \Vt  can  be  used  effectivly 
for  deletion  of  the  least  useful  variables  thereby 
focussing  on  those  variables  that  best  support 
clustering. 


K  Original  Research  Paper 


83 


references 

1  G.  Zeng  and  R  C.  Dubes,  A  comparison  of  tests  for  random¬ 
ness,  Pattern  Recognition,  18  (1985)  191-198 

2  B.  Hopkins,  A  new  method  for  determining  the  type  of 
distribution  of  plant  individuals.  Annals  of  Botany,  18  (1954) 

213-227,  f  .  ^ 

3  A.  Jain  and  R  Dubes,  Algorithms  For  Clustering  Data, 
Prentice  Hall,  Englewood  Cliffs,  NJ,  1988,  pp.  136-137. 


4  RG.  Lawson  and  P.C,  Jurs,  New  index  for  clustering  tend¬ 
ency  and  its  application  to  chemical  problems.  Journal  of 
Chemical  Information  and  Computer  Science,  30  (1990)  36-41. 

5  R  G.  Lawson  and  P  C.  Jurs,  Cluster  analysts  of  acrylates  to 
guide  sampling  for  toxicity  testing.  Journal  of  Chemical 
Information  and  Computer  Science,  30  (1990)  137-144 


■  Diicussion 


85 


Chemometncs  and  Intelligent  Laboratory  Systems,  JO  (1991)  85-86 
Elsevier  Science  Publishers  B  V.,  Amsterdam 


Comments  on  “Analysis  of  chemical 
structure-biological  activity  relationships  using 
clustering  methods”  by  Peter  C.  Jurs  and  Richard  G. 

Lawson 

Leon  Jay  Glescr 

Department  of  Mathematics  and  Statistics,  University  of  Pittsburgh,  Pittsburgh,  PA  15260  (U.S.A.) 


Cluster  analysis  shares  with  other  scaling  meth¬ 
ods  (such  as  pnncipal  components,  factor  analy¬ 
sts)  the  ideal  that  there  is  an  underlying  structure 
which  influences  observed  variables,  but  which  is 
not  entirely  revealed  by  these  variables.  This  un¬ 
derlying  structure,  as  compared  to  the  observed 
variables  themselves,  may  also  be  more  highly 
predictive  of  other  phenomena.  For  example, 
structural  similarities  of  sets  of  compounds  may 
reflect  underlying  chemical  structures  that  are  re¬ 
lated  to  the  biological  toxicity  of  these  com¬ 
pounds.  Thus,  clustering  is  seen  as  a  valid  alterna¬ 
tive  to  regression  analysis  as  a  way  of  predicting 
these  other  phenomena  (e.g.,  toxicity).  Once  clus¬ 
ters  have  been  identified,  analysis  of  variance  can 
be  used  to  demonstrate  the  predictive  ability  of 
the  clusters.  A  similar  approach  has  been  used  in 
educational  research  to  find  predictors  of  im¬ 
provement  in  mathematics  achievement  of  junior 
high  school  students  [1]. 

Although  the  modified  Hopkins  statistic  dis¬ 
cussed  by  Jurs  and  Lawson  (2|  can  be  useful  in 
determining  whether  multivariaic  data  reflect  un¬ 
derlying  clusters,  it  has  some  disadvantages.  One 
disadvantage  is  that  this  statistic  depends  heavily 
on  the  scales  of  the  variables  measured.  Simply 
changing  the  scale  of  measurement  on  any  single 
variable  measured  will  change  the  value  of  //. 
More  generally,  the  value  of  II  is  affected  by  the 
standard  deviations  of  the  variables  being  consid¬ 


ered.  (This  disadvantage  of  II  is  shared  by  prin¬ 
cipal  component  analysis,  where  scale  changes  can 
influence  principal  values  and  principal  vectors  in 
complex  ways.)  Lawson  and  Jurs  [3)  are  aware  of 
this  problem,  and  standardize  their  variables  be¬ 
fore  clustering.  However,  if  sample  standard  devi¬ 
ations  are  used  for  standardization,  rather  than 
the  actual  population  standard  deviations,  the  dis¬ 
tribution  of  the  Hopkins  statistics  is  no  longer 
necessarily  a  beta  distribution  (even  if  reasonably 
large  samples  are  used  to  estimate  standard  devia¬ 
tions). 

A  second  possible  disadvantage  of  the  modified 
Hopkins  statistic  is  that  it  concentrates  on  cluster¬ 
ing  as  a  multivariate  phenomenon  (i.e.,  due  to 
dependence  of  the  variables).  This  excludes  from 
consideration  clusters  that  can  form  in  the  multi¬ 
variate  space  because  the  individual  variables 
themselves  show  clustering  (multimodality)  in  their 
marginal  distributions,  while  yet  being  indepen¬ 
dent.  Since  the  ideal  in  scaling  is  a  latent  structure 
which  relates  the  observed  values,  and  this  latent 
structure  is  of  primary  interest,  this  may  not  be  a 
serious  defect.  However,  it  docs  raise  the  concep¬ 
tual  question  of  what  constitutes  a  cluster. 

It  is  to  Jurs  and  Lawson’s  credit  that  they  have 
eliminated  the  major  disadvantage  of  the  original 
Hopkins  statistic-namcly,  the  insistence  on  as¬ 
suming  that  unclustcred  variables  were  indepen¬ 
dent  and  uniformly  distributed.  Few  variables  cn- 


0169-7439/91  /SO 3.50  0 1991  -  Elsevier  Science  Publishers  B.V. 


86 

countered  in  nature  have  uniform  distributions. 
Although  it  is  possible  to  transform  marginal  dis¬ 
tributions  so  that  they  are  uniform  (by  the  prob¬ 
ability  integral  transformation  performed  variable 
by  variable),  such  transformations  destroy  com¬ 
parability  of  distances  to  nearest  neighbors  It  is 
these  distances  between  data  points  that  most 
intuitively  convey  the  notion  of  ‘clustering’.  (If  all 
that  is  meant  by  clustering  were  lack  of  indepen¬ 
dence,  then  tests  of  independence  based  on  either 
the  Kolmogorov-Smirnov  distance  between  multi¬ 
variate  distributions  or  Pearson  chi-squared  tests 
of  independence  based  on  grouped  data  in  con¬ 
tingency  tables  could  be  used.  The  distances 
utilized  by  these  tests  have  little  resemblance  to 
Euclidean  distances  between  data  points.) 

Besides  use  of  the  Hopkins  statistic,  there  are 
other  ways  that  the  ‘reality’  of  observed  clusters 
can  be  demonstrated.  Using  more  than  one  dis¬ 
tinct  method  for  searching  for  clusters  (e.g.,  K - 
means  and  Isodata,  as  used  by  Jurs  and  Lawson 
(2)  in  their  chemical  data)  is  one  good  method;  If 
different  search  methods  arrive  at  similar  (num¬ 
bers  of)  clusters,  one  can  be  less  worried  that  the 
clusters  arc  artifacts  of  a  particular  search  method. 
Additionally,  one  can  hold  back  a  randomly 
selected  subset  of  variables  in  an  initial  clustering 
search,  and  then  see  if  adding  these  variables 
changes  the  conclusions.  (This  approach  assumes 
that  no  small  subset  of  variables  by  itself  defines 
the  true  underlying  clusters.)  Instead  of  withhold¬ 
ing  variables,  one  can  randomly  divide  data  points 
(cross-validation)  into  two  or  more  groups  and  see 
if  similar  clusters  arise  in  such  data  sets.  This 
approach  is  associated  with  a  formal  statistical 
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theory  that  is  currently  discussed  under  the 
terminology  ‘bootstrap  analysis’  J4,5).  There  is 
also  a  resemblance  between  the  subsampling  of 
data  used  in  the  Hopkins  test  and  the  resampling 
methods  used  in  bootstrap  analysis.  Finally,  as 
demonstrated  by  Jurs  and  Lawson,  one  can  see  if 
the  clusters  found  make  sense  in  the  light  of 
existing  chemical  (and  biological,  in  the  case  of 
toxicity)  knowledge.  If  the  clusters  successfully 
predict  other  phenomena  (eg.  toxicity),  this  is 
further  evidence  that  such  clusters  are  not  artifacts 
of  the  data. 

As  Jurs  and  Lawson  so  clearly  show,  cluster 
analysis  has  the  potential  to  yield  important  in¬ 
sight  and  direction  in  the  study  of  classes  of 
chemical  compounds. 
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Abstract 
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The  use  of  calibration  models  to  predict  analyte  concentrations  in  samples  showing  icsponscs  from  poorly  calibrated  components, 
or  samples  showing  dnft  in  the  instrumental  response  function,  is  seldom  successful.  These  incomplete  calibration  models  cannot 
account  for  variations  not  encountered  in  the  calibration  step.  Simple  modifications  arc  possible  which  remedy  this  difficulty  for 
classical  least  squares  (CLS)  regression.  By  using  sequential  regression  for  the  prediction  step,  extensions  are  possible  which  lessen 
errors  due  to  overfilling,  and  permit  prediction  of  wclbmodclled  components  in  the  presence  of  unmodclled  components 
Implementation  of  the  sequential  regression  is  conveniently  done  through  use  of  the  Kalman  filter.  Use  of  filter  models  for  dynamics 
and  measurement  also  permits  correction  of  dnft  of  various  types.  The  use  of  CLS  calibration  with  Kalman  filter  prediction  is 
presented  and  tested  with  simulated  spectroscopic  data.  Compansons  are  made  to  other  calibration  and  prediction  methods. 


INTRODUCTION 

Care  in  the  calibration  step  is  very  important 
for  a  successful  multicomponent  analysis.  During 
the  initial  phase  of  a  calibration,  when  standard 
mixtures  of  analytes  are  measured,  effort  must  be 
made  to  calibrate  over  the  widest  possible  range  of 
instrumental  conditions,  analyte  concentrations, 
and  potential  interferences.  From  these  calibration 
data,  a  calibration  model  is  generated  which  ex¬ 
plains  as  much  as  possible  of  the  variations  seen 
during  the  calibration  step.  The  model  is  used  to 
predict  analyte  concentrations  from  further  mea¬ 


surements  made  on  predictors.  Care  in  collecting 
calibration  data  and  generating  a  calibration  model 
is  repaid  in  the  range  over  which  the  calibration 
remains  valid  during  picdtction. 

Even  with  great  care  in  calibration,  there  is  still 
the  likelihood  of  instrumental  drift  with  time,  and 
the  chance  that  small  changes  in  the  nature  of  tne 
sample  may  appear  in  the  form  of  unexpected 
(and  uncalibrated)  components.  Drift  and  unmod- 
cllcd  responses  present  two  significant  challenges 
to  calibration  schemes.  Both  can  be  regarded  as 
ummodelled  components  in  the  calibration,  but 
the  effects  of  these  unmodclled  components  are 
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seen  during  prediction.  The  development  of 
calibration  methods  that  are  more  robust  to  the 
effects  of  instrumental  drift  and  unmodelled  com¬ 
ponents  would  greatly  extend  the  useful  range  of 
many  calibration  schemes. 

Considerable  research  has  been  directed  at 
methods  for  improving  the  modelling  of  the 
calibration  step.  Methods  based  on  regression  of 
data  onto  factor  models  or  on  the  relation  of 
latent  variables  have  been  developed  to  improve 
the  calibration  process  by  lessening  the  effects  of 
noise  in  the  calibration  model  (1,2)  These  meth¬ 
ods  have  shown  success  in  generating  very  reliable 
models  for  the  calibration  step,  but  they  are  less 
successful  at  predicting  concentrations  for  multi- 
component  samples,  especially  those  that  are  ob¬ 
served  under  conditions  far  removed  from  the 
conditions  of  calibration.  Other  methods,  for  ex¬ 
ample  those  based  on  rank  annihilation,  might  be 
more  suited  to  treatment  of  chemical  measure¬ 
ment  of  samples  containing  well-modelled  compo¬ 
nents  coexisting  with  unknown  contaminants  (3). 
Because  these  methods  presume  identical  spectral 
or  temporal  behavior  for  any  well-modelled  com¬ 
ponents,  so  that  second-order  or  higher  data  can 
be  rank  annihilated,  they  are  more  suited  to  arrays 
to  bilinear  spectra  than  to  time-varying  calibration 
systems,  which  may  contain  time  jitter  from  run  to 
run  (4J,  That  jitter  makes  registration  of  the  bilin¬ 
ear  arrays  uncertain,  and  it  causes  difficulties  in 
the  rank  reduction  process.  Drift  in  the  instrumen¬ 
tal  response  is  also  problematic  to  rank  reduction 
methods  because  of  the  lack  of  reproducibility  of 
the  time  varying  responses  of  standards  and  sam¬ 
ples. 

The  prediction  step  can  be  considered  a  time- 
series  process,  and  it  seems  reasonable  to  apply 
methods  intended  for  time  series  analysis  in  at¬ 
tempting  to. create  calibration  models  which  are 
more  robust  to  errors  in  the  prediction  step.  Since 
the  time-series  involved  are  multivariate,  given  the 
multicomponent  chemical  models  and  the  multi¬ 
component  responses  observed,  a  multivariate  ap¬ 
proach  is  appropriate. 

One  multivariate,  time-based  approach  that 
might  be  examined  is  the  Kalman  filter.  Although 
many  of  its  time-series  properties  have  not  been 
used  to  full  advantage  in  applications  in  analytical 


chemistry,  this  algorithm  has  been  extensively  used 
for  analysis  of  multicomponent  data  (5).  Previous 
work  from  this  laboratory  [6,7]  has  demonstrated 
that  modified  Kalman  filter  methods  may  be  ad¬ 
vantageous  for  multicomponent  analysis  in  the 
presence  of  unanticipated  and  unmodelled  re¬ 
sponses  in  a  multicomponent  signal.  Some  work 
•  on  drift  compensation  of  univariate  systems  (8) 
has  also  appeared. 

This  paper  demonstrates  that  one  form  of 
calibration,  classical  least  squares  (CLS)  calibra¬ 
tion,  is  directly  compatible  with  ordinary  Kalman 
filtering,  either  in  vector  or  in  scalar  (sequential 
regression)  form.  Additions  to  the  CLS  calibration 
model  which  account  for  random  drift  and  for 
unmodelled  responses  are  presented  and  dis¬ 
cussed.  All  methods  arc  tested  with  simulated 
spectroscopic  data. 


THEORY 

Classical  least  squares  calibration 

For  analysis  of  a  set  of  compounds  contained 
in  a  mixture,  any  of  the  standard  methods  of 
multicomponent  calibration  can  be  used.  CLS 
calibration,  sometimes  called  A'- matrix  calibra¬ 
tion,  is  convenient  for  use  here  because  of  its 
assumption  of  the  least-squares  causal  model  re¬ 
lating  the  measured  response  A*  of  standards  to 
their  known  concentrations  C, 

A.-C.K  +  C  (1) 

where  the  /iXp  matrix  K  relates  the  wi  spectra 
collected  over  p  sensor  channels  to  the  m  X  n 
concentrations  in  C,.  From  the  calibration  step, 
where  both  A,  and  C,  are  known,  matrix  K  is 
easily  obtained  from 

K  “  (C,rC,)  "'c/A,  (2) 

The  columns  of  tnc  ‘K-inatnx\  K,  are  estimates  of 
the  pure-component  spectra  of  species  involved  in 
the  calibration.  Once  the  calibration  is  completed, 
the  matrix  K  can  also  be  used  to  estimate  the 
concentrations  of  analytes  Cu  in  unknown  sam¬ 
ples,  since 

C;-AuKr(KKr)''  (3) 
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The  accuracy  of  the  estimates  obtained  from 
the  prediction  step  depends  on  the  adequacy  of 
the  calibration  model  K ,  and  the  presence  of 
additional,  unexpected  components  altering  the 
multicomponent  response  Aa.  These  can  be  ad* 
ditive,  as  might  be  the  case  when  additional  con¬ 
stituents  are  present,  and  these  constituents  re¬ 
sponses  contribute  to  the  multicomponent  signal. 
They  also  might  be  multiplicative,  as  would  be  the 
case  when  linear  or  proportional  dnft  caused  a 
change  in  the  instrumental  response  expected  for  a 
given  concentration  of  analytes. 

While  calibration  based  on  classical  least- 
squares  is  well-understood,  since  it  is  one  form  of 
ordinary  multiple  linear  regression,  it  may  not 
always  be  the  best  method  for  calibration.  Some 
of  the  undesirable  features  of  a  calibration  based 
on  CLS  regression  include  possible  overfitting  of 
data  to  the  calibration  models,  where  parts  of  the 
unknown  response  are  fitted  to  noise  in  the 
calibration  models  (1). 

Sequential  regression  for  prediction 

One  way  to  alter  CLS  calibration  is  to  perform 
the  regression  of  unknown  response  onto  models 
sequentially,  rather  than  in  a  single  step.  Sequen¬ 
tial  regression  of  data  onto  the  classical  causal 
model  of  equation  1  is  well-established  (9-11). 
The  algorithm  is  given  by  three  equations,  one  for 
the  update  of  the  regression  parameters  (here,  the 
unknown  concentrations),  one  for  the  update  of 
the  covariance  of  the  estimates,  and  one  for  the 
correction  of  the  current  estimates  Cu  and  P  to 
account  for  the  information  contained  in  new 
data.  If  the  regression  parameters  are  contained  in 
the  n  X  1  vector  Q,  with  covariance  P,  the  recur¬ 
sion  relations,  expressed  for  the  Vth  channel  of  a 
p-channel  spectrum,  are 

Q,(*)  -  cu(*  - 1) + l(*)[au(*) 

-<(*)K(*)]  (4) 

l(m_ _ Hk-mk)  ,, 

M  ’  \/a(k)  +  KT(k)V(k-\)K(k)  W 

P(A-)  -  P(k  - 1) 

P(Ar-l)Kr(/,-)P(A--l) 

\/a(k)  +  Kr(A)P(A—  l)K(A)  W 


In  these  equations,  a(k )  is  the  weight  given  to 
observation  A u(&),  and  L(fc)  is  the  correction 
factor  used  to  update  Cu(&)  and  P (k).  Careful 
choice  of  appropriate  values  for  a(k)  will  reduce 
the  problem  of  overfitting  mentioned  above.  The 
calibration  matrix  K(Ar)  can  be  calculated  directly 
from  eq.  (2)  above,  or  it  also  can  be  obtained  by 
application  of  sequential  regression  of  the  spectra 
obtained  during  calibration  runs  onto  the  stan¬ 
dard  concentrations,  using  a  regression  approach 
analogous  to  that  in  eqs.  (4) -(6). 

Sequential  regression  requires  initial  guesses 
Cu(0)  and  the  covariance  matrix  P(0),  a  measure 
of  the  uncertainty  of  the  initial  guess  Cu(0).  The 
covariance  matrix  has  units  of  concentration 
squared,  and  its  diagonal  elements  are 'the  vari¬ 
ance  associated  with  each  element  of  the  con¬ 
centration  vector  C^. 

With  correct  regression  models,  the  sequential 
estimates  quickly  become  independent  of 

the  initial  guess  Q(0),  provided  that  a  ‘reason¬ 
able’  value  is  selected  for  P(0).  Values  of  about 
1-100  times  Cu(0)  work  well  for  the  diagonal 
values  of  P(0);  the  off-diagonal  elements  may  be 
set  to  zero.  Larger  values  of  P(0)  typically  aid  in 
getting  rapid  convergence.  When  P(0)  is  selected 
too  small,  biased  results  for  Q,  will  result  from  the 
sequential  regression  (9). 

While  sequential  regression  may  not  always  be 
as  computationally  efficient  as  ordinary  regres¬ 
sion,  it  sometimes  can  be  more  computationally 
efficient,  depending  on  the  number  of  parameters 
to  be  fitted,  the  dimension  of  the  measurement, 
and  the  weighting  factors.  Cases  where  sequential 
regression  has  a  computational  advantage  over 
ordinary  regression  arise  where  the  few  parameters 
arc  to  be  fitted  to  a  high-dimension  measurement, 
and  where  weighting  data  are  available  for  use  in 
the  fitting:  this  situation  is  common  in  the  analy¬ 
sis  of  multicomponent  data  in  analytical  chem¬ 
istry.  Sequential  regression  also  offers  other  ad¬ 
vantages.  Two  of  these  advantages  are  the  elimina¬ 
tion  of  the  need  for  matrix  inversion,  and  the 
possibility  of  using  prior  information  on  the  val¬ 
ues  and/or  distribution  of  C„  and  P.  A  third 
advantage  is  the  ease  with  which  the  regression 
problem  can  be  recast  into  forms  suited  to  analy¬ 
sis  by  regression  methods  based  on  loss  functions 
other  than  simple  least  squares. 
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Within  a  Bayesian  framework,  for  example,  Q 
can  be  considered  a  random  parameter  vector  with 
some  prior  distribution,  and  the  set  of  observa¬ 
tions  should  be  correlated  with  Cu.  The  posterior 
probability  density  function  for  C„  is. desired  at 
some  point  k ,  that  is  p(Cu\\a).  The  estimate  Cu 
can  be  obtained  from  the  distnbution;  a  common 
approach  is  to  use  the  value  for  which  the  distri¬ 
bution  attains  a  maximum  —  the  maximum  a 
posteriori  (MAP)  estimate.  For  a  symmetric  distri¬ 
bution,  the  MAP  estimate  coincides  with  the  mean 
of  the  distnbution,  and  it  is  also  the  value  that 
minimizes  the  parameter  error  variance  E((CU  — 
A»)(C^~<5u)r).  The  problem  is  to  determine  the 
evolution  of  the  density  function  (or  its  mean) 
with  added  data.  In  general,  solution  of  this  prob¬ 
lem  is  not  possible,  but  if  measurement  noise  e  is 
taken  as  Gaussian,  an  exact  solution  is  possible. 
Under  these  constraints,  it  is  found  that  optimal 
weighting  of  observations  is  given  by  the  relation 

\/a(k)  -  E[(<(fc)  -  e[k))r(e(k)  -«(/:))] 

(7) 

Given  this  definition  of  the  weighting,  the  sequen¬ 
tial  regression  can  also  be  cast  into  a  form  amen¬ 
able  to  use  with  the  scalar  form  of  the  Kalman 
filter,  with  system  dynamics  model 

X(*  +  1)  —  F(*)X(*)  +  «•(*)  (8) 

and  a  measurement  model 

t(k)-HT(k)X(k)  +  t>(*)  (9) 

where,  for  simple  K-matrix  prediction,  the  filter 
state  X  is  the  vector  Cy,  the  filter  measurement 
matrix  H  is  the  calibration  matrix  K,  the  filter 
measurement  z  is  the  spectral  datum  Au,  and  the 
filter  noise  parameter  v  describes  the  calibration 
measurement  error  e.  If  the  filter  dynamics  matri> 
is  set  to  identity  for  this  time-dependent  problem; 
and  the  filter  systems  noise  h*  is  taken  as  zero,  eqs. 
(4)- (6)  may  be  seen  to  be  identical  with  the  up¬ 
date  equations  from  the  scalar  Kalman  algorithm 
(eqs.  (A3)-(A5)  in  the  Appendix),  where  the  vec¬ 
tor  quantity  L(/r)  is  the  Kalman  gain.  The  filter 
time  projection  eqs.  (Al)  for  the  state,  and  (A2) 
for  the  state  covariance,  are  identities  in  this ; ex¬ 
ample,  because  the  filter  model  for  systems  dy¬ 


namics  (eq.  (8))  is  an  identity  m  this  analysis,  and 
because  Q(k) «  E|»v(&)wr(A')]  m  0  and  R (k)  = 
E [v(k)vT(k))  “  l/a  when  the  system  noise  is  zero 
and  the  measurement  noise  is  defined  by  eq.  (7) 
above.  Use  of  Kalman  filter  methods  therefore 
offers  a  general,  flexible  framework  for  classical 
least  squares  calibration  and  prediction,  since 
classical  least  squares  can  be  taken  as  a  subset  of 
the  more  general  filtering  approach,  with  identity 
systems  dynamics  and  uniform  weighting. 

Modelling  drift  in  CLS  prediction 

The  systems  dynamics  matrix  F(A)  of  the  Kal¬ 
man  filter  need  not  be  identity,  however  A  model 
for  drift  can  be  used  to  describe  filter  state  dy¬ 
namics,  thus  extending  the  CLS  calibration  model 
to  track  drifting  multicomponent  systems.  Ran¬ 
dom  and  linear  drift  models  are  believed  to  de¬ 
scribe  many  chemical  systems  (8).  A  drifting 
parameter  X  is  generally  described  by  a  linear 
equation 

X(/)~X(/-1) +  </(*)  (10) 

where  d(t)  is  the  drift.  Random  drift  occurs  when 
d(t)  is  a  random  parameter,  while  linear  drift 
results  when  d{t)  vanes  systematically  with  time. 
If  the  state  is  defined  as  X(r)  -  (C^(r),  </(/)],  this 
systems  model  leads  to  a  simple  systems  dynamics 
model,  namely 


with  the  measurement  matrix  as  the  time-indepen¬ 
dent  quantity 

»“[o]  <12> 


This  dynamic  model  is  observable  if  matrix 
(HFrH(F7')2H...(Fr)'-,l  is  of  rank  n  for  the 
n-dimensional  state  vector  X  (11).  In  this  instance, 
this  matrix  is  of  full  rank  if  K  is  of  full  rank,  and 
if  duplicate  measurements  are  made  on  each  sam¬ 
ple,  so  that  drift  variables  in  d  can  be  char¬ 
acterized. 

Other  forms  of  instrumental  dnft  are  just  as 
easily  modelled.  With  proportional  drift,  the  re¬ 
sponse  at  some  time  t  might  be  related  to  the 
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response  at  an  earlier  time  t  —  1,  by  z(t)  ■=  l/y(/) 
-  1)  where  y (/)  is  a  time-dependent,  random 
parameter.  This  leads  to  a  systems  dynamics  equa¬ 
tion  that  is  now  a  function  of  time,  t 

cu(;  +  i)~y(0<;(0 +  »’(')  (13) 

Now,  the  set  of  parameters  Q,  and  the  random 
parameter  y  must  both  be  estimated  to  obtain  Cu 
in  the  presence  of  random  drift.  Define  the  system 
state  as  X(f) 63  (CM(*),  y (/  -  1)).  Then  the  filter 
models  are 


Q(') 

v('-l) 


v('  1)  1  f 

Y 

.  0  Jlr('-I), 

0 

(14) 


lo  account  for  drift  over  time  between  spectral 
measurements,  and 

=(/)  -  KQ  +  o  (15) 

to  describe  the  calibration  during  the  prediction  of 
this  particular  spectral  measurement.  As  discussed 
above,  the  prediction  step  may  be  solved  directly, 
with  a  matrix  inversion,  to  obtain  state  estimates 
Q,  or  it  may  be  broken  down  to  a  scries  of  scalar 
relations  defining  the  sequential  regression  of  : 
onto  K 

z{k)-K(kK  +  v(k)  (16) 

Such  dccomposilion  of  the  measurement  vector  z 
into  a  sequence  of  scalar  measurements  z(k)  is 
common  in  the  engineering  literature  (10,12).  For 
the  filter  models  described  by  eqs.  (13)  and  (16), 
the  index  t  describes  time  between  spectral  mea¬ 
surements,  while  index  k  describes  scalar  compo¬ 
nents  of  the  measurement.  The  state  X  will  be 
both  time-  and  wavelength-dependent,  but  since 
only  state  estimates  are  the  end  of  the  update 
process  arc  of  interest,  and  not  the  evolution  of 
states  during  the  sequence  of  scalar  updates,  states 
are  given  in  terms  of  time  for  this  model.  State 
projection  occurs  between  measurement  of  full 
spectra,  while  state  update  occurs  for  each  spectral 
channel. 

In  this  treatment,  it  is  assumed  that  spectral 
measurement  is  fast,  and  that  drift  during  collec¬ 
tion  of  a  spectrum  is  negligible.  If  so,  the  systems 


dynamics  matrix  can  be  expressed  as  the  time- 
and  state-dependent  quantity 

/(X,r)~[ri')Q(')  J]  (17) 

and  the  measurement  matrix  is  defined  as  the 
time-independent  quantity  defined  in  eq.  (12) 
above.  This  filter  model  is  nonlinear,  since  the 
system  dynamics  depends  upon  the  present  value 
of  the  filter  state.  The  states  of  this  model  may  be 
estimated  by  use  of  the  extended  Kalman  filter 
(5,9-1  lj.  In  essence,  the  extended  filter  provides  a 
way  to  linearize  the  systems  dynamics  matrix  / 
about  the  current  state  estimates,  so  that 


F(X) 


MM. 

3X  ■'-* 


y('-D  Q,d) 

0  1 

(18) 


where  C„(r)  and  f(r  - 1)  are  Ihc  current  state 
estimates  in  the  extended  Kalman  filter.  It  is  pos¬ 
sible  to  perform  sequential  regression  over  the 
spectral  data  to  obtain  estimates  of  states  C„  for  a 
given  spectrum  and  time,  then  proceed  though  the 
extended  Kalman  filler  to  provide  predictions  of 
drift  between  spectral  measurements,  as  described 
above.  If  a  good  estimate  of  the  system  noise  Q(r) 
is  available,  accurate  estimation  of  the  true  con¬ 
centrations  and  the  apparent  drift  in  concentra¬ 
tion  should  be  possible  using  these  simple  modifi¬ 
cations  to  CLS  prediction. 

Examination  of  the  equations  for  the  Kalman 
filter  (eqs.  (A1)-(A7))  demonstrates  that  the  equa¬ 
tions  for  updating  state  estimates  arc  decoupled 
from  those  used  lo  project  states  ahead  in  lime. 
There  is  no  reason  why  other  regression-based 
prediction  methods  which  employ  externally-sup¬ 
plied  initial  guesses  cannot  be  used  in  conjunction 
with  the  projection  equations  used  in  the  Kalman 
filter.  In  this  way,  other  calibration  methods  might 
be  extended  to  account  for  drift  between  samples, 
or  for  other  time-dependent  effects. 


Compensating  for  unmodelled  responses  m  predic¬ 
tion 


If  the  measurement  model  is  in  error,  ordinary 
regression  of  data  onto  the  spectral  models  will 
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produce  inaccurate  estimates  of  concentration. 
With  any  recursive  algorithm,  there  is  also  the 
possibility  of  skipping  the  processing  of  data  that 
is  corrupted  by  the  existence  of  poorly  modelled 
signals.  This  feature  can  be  used  to  avoid  regions 
of  data  for  which  models  are  in  error,  provided 
some  means  of  evaluating  the  model  quality  can 
be  found. 

Adaptive  filtering  for  estimation  of  noise  processes 

Several  indicators  exist  for  model  quality.  The 
most  reliable  arc  based  on  the  filter  innovations,  a 
measure  of  how  well  the  filter  model  can  predict 
new  data.  For  scalar  Kalman  filtering,  the  filter 
innovations  are  defined  as 

'(*) -*(*)-  «r(A')X  (19) 

where  X(&|/r-  1)  is  the  projected  state  at  point 
Ar,  based  on  information  up  through  point  k  —  1. 
One  possible  way  to  evaluate  innovation  quality  is 
to  compare  the  observed  innovations  sequence 
v(k)  with  that  expected  from  the  filter  theory. 
With  a  correct  filter  model,  the  filter  innovations 
arc  given  by  II(/c)P(fc)Hr(&),  assuming  no  corre¬ 
lation  of  state  and  measurement  noise.  This  quan¬ 
tity  accounts  for  the  presence  of  error  in  z(k) 
which  is  not  part  of  the  filter  model  H (k).  With  a 
correct  model,  the  error  in  z(k)  is  random,  and  its 
variance  is  R  *  E)u(/;)or(*)].  According  to  the¬ 
ory,  for  a  correct  filter  model,  with  Gaussian  noise 
on  the  measured  data,  the  filler  innovations  will 
also  be  Gaussian.  In  addition,  the  innovations  will 
have  a  mean  value  of  0  and  a  standard  deviation 
of  /(*).  When  the  observed  innovations  deviate 
significantly  from  theory,  model  error  must  be 
present  111,13). 

The  actual  error  being  evaluated  in  any  com¬ 
parison  of  observed  and  theoretical  innovations  is 
error  in  modelling  R ,  and  not  H,  however.  In  the 
theory  of  the  Kalman  filter,  it  is  assumed  tiiat,  in 
addition  to  being  Gaussian  noise  processes,  with 
covariances  R  and  Q,  the  noise  sequences  v(k) 
and  H’(r)  have  zero  means.  Any  error  in  modelling 
the  measurement  matrix  H  will  be  indicated  by  a 
nonzero  mean  for  v(k),  while  errors  in  modelling 
F  will  appear  as  a  nonzero  mean  for  H*(r).  An 
adaptive  filter  tests  the  modelling  of  the  filter 


noise  variances.  If  the  additional  assumption  is 
made  that  R  and  Q  are  well-modelled,  however, 
any  modelling  error  detected  may  then  be  as¬ 
signed  to  non-zero  noise  means  For  an  adaptive 
filter  based  on  matching  of  theoretical  and  experi¬ 
mental  innovations,  the  error  is  attributed  to  devi¬ 
ations  in  the  presumed  mean  of  v.  This  model 
error  can  be  ‘covered  up*  by  artificially  increasing 
the  measurement  variance  R(k),  which  effectively 
down-weights  the  parts  of  the  spectral  data  that 
are  not  well-modelled.  Any  regression  done  with 
incomplete  models,  however,  is  suboptimal,  and 
the  results  obtained  from  adaptive  filtering  are  not 
always  minimum  variance  estimates.  Operation  of 
this  filter  requires  averaging  of  a  set  of  innova¬ 
tions  prior  to  comparison  with  theory,  for  better 
statistical  properties  (6).  The  lag  introduced  by  the 
averaging  process  makes  the  filter  slow  to  con¬ 
verge  to  good  estimates  of  states.  Estimates  ob¬ 
tained  from  the  covariance-matching  adaptive 
filter  are  very  dependent  on  the  initial  guesses 
used  to  begin  filtering,  and  simplex  optimization 
has  been  needed  to  locate  the  best  filtering  results, 
as  well  as  to  automate  this  adaptive  filter  (7). 
Because  of  these  undesirable  features,  the  covari¬ 
ance-matching  adaptive  filter  was  not  used  here 

Adaptive  filtering  by  innovations  correlation  match¬ 
ing 

Another  check  on  model  quality  can  be  done 
by  investigation  of  the  autocorrelation  of  the  in¬ 
novations.  Matching  observed  innovations  auto¬ 
correlation  over  a  part  of  the  innovations  se¬ 
quence  to  that  expected  from  filter  theory  permits 
estimation  of  the  noise  variances  required  for  the 
filter  model.  For  correlation  matching,  the  auto¬ 
correlation  function  a  is  calculated  for  the  innova¬ 
tions  over  some  window  of  autocorrelation  lags. 
Then,  the  experimental  autocorrelation  <}>  is  re¬ 
lated  to  the  theoretical  autocorrelation  <I>  by  the 
equation 

*0W) -*(*,/)« +  i|(M)  (20) 

for  datum  k  and  lag  /,  where  q(&,/)  is  a  zero- 
mean,  white  noise  term,  and  a  is  the  fitted  param¬ 
eter,  taken  here  as  independent  of  k.  The  noise 
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variances  are  expressed  as  linear  functions  of  a,  so 
that 

s 

R(*  )«£*,«,  (21) 

1-1 

and 

iV 

Q  (22) 

i-i 

The  parameter  a  is  obtained  from  the  observed 
innovations  autocorrelation  by  the  sequential  re* 
gression 

<x{k)  «  1)  -f  0(*)*7‘(A)W~,(*) 

XR(*) -<&(*)«(*))  (23) 

where  ihe  covariance  parameter  0  is  propagated 

by 

Q(k)  -Q(k-  1)  -  0(*  -  l)<l>r‘(Ar)[w(A-) 

+<Hk)Q(k-l)<bT(kj\~'<l>(k) 
xG(k-l)  (24) 

and  where  W  is  a  weight  matrix  determined  by  the 

autocorrelation  at  lag  0  (14), 

While  the  computationally  simpler  matching  of 
theoretical  and  experimental  innovations  can  also 
be  used  to  estimate  noise  variances,  the  results  of 
Monte  Carlo  studies  show  that  noise  estimates 
from  these  adaptive  filters  tend  to  be  biased  (15). 
Further,  with  matching  of  observed  and  theoreti¬ 
cal  innovations  autocorrelation,  it  is  possible  to 
estimate  both  noise  variances  (R  and  Q)  at  once, 
and  these  estimates  are  not  strongly  affected  by 
measurement  model  error  in  the  data  (14).  Once 
these  quantities  have  been  estimated,  subsequent 
estimation  of  noise  means  (deviations  of  E(w)  and 
E(i>)  from  zero)  can  be  performed.  For  this  rea¬ 
son,  innovations  correlation  was  used  to  obtain 
estimates  of  R  and  Q  for  Filter  studies  throughout 
this  work. 

Adaptive  filtering  by  estimating  innovations  vari¬ 
ance 

A  third  approach  to  adaptive  filtcri..6  makes 
use  of  the  available  error  information  carried  in 


the  filter  innovations  and  state  covariance  matrix 
P»  For  a  chemical  system  with  no  systems  dy¬ 
namics,  the  variance  in  the  innovations  is  expected 
to  be  a  function  of  the  measured  variables  z ,  the 
states  X  and  the  measurement  model  H  according 
to  the  relation 


which  yields  upon  substitution  the  relation 


V,-  (*(*)  +  X(fc|*-l)QXr(*l*-  1) 

4-H(S:)P(*|fc-l)Hr(it)),/J  (26) 

This  equation  reflects  the  fact  that  the  innovation 
uncertainty  or  must  remain  large  when  states  are 
not  well  known,  but  must  be  decrease  to  the  limit 
of  measurements  noise  when  states  are  well  known. 
Since  this  relation  is  based  on  the  knowledge  of  R, 
it  presumes  accurate  estimates  of  noise  variances, 
but  is  permits  rapid  rejection  of  incorrectly  mod¬ 
elled  data  if  knowledge  of  noise  variances  is  avail¬ 
able.  Data  may  be  filtered  normally,  or  rejected, 
based  on  comparison  of  the  innovations  v(k)  and 
the  value  of  a,(A'):  innovations  falling  within  ±  3 ar 
may  be  considered  ‘within  those  expected  for  a 
correct  model’,  but  those  innovations  falling  out¬ 
side  of  this  range  are  clear  indicators  of  error  in 
the  model. 

In  this  connection,  it  should  be  noted  that  the 
standardized  innovations  n(k) 

«(*)- -(*)«,(*)"'  (27) 

may  also  be  defined.  The  squared,  standardized 
innovations  observed  for  filtering  p  measurements 
with  an  ^-dimensional  state  model  distribute  as 
chi-square,  with  p-n  degrees  of  freedom  (10), 
With  this- relation,  a  simple  test  can  be  used  to 
evaluate  model  quality.  A  threshold  can  be  set,  so 
that  innovation  values  falling  within  the  threshold 
are  filtered  normally,  while  those  falling  outside 
the  threshold  are  ignored  in  the  filtering,  and 
affect  neither  the  filter  states  or  covariances.  For 
example,  innovations  well  below  the  threshold 
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Fig  1  Simulated  spectral  responses  for  components  present  in  multicomponent  data  used  for  calibration  and  prediction  studies 
Numbers  1-4  refer  to  the  response  function  created  for  the  three  calibrated  components  and  the  unmodelled  component. 


3 1  a,  |  have  a  fairly  high  probability  of  occurring 
by  chance,  while  those  with  values  greater  than 
3 1  ct,  I  are  not  likely.  In  practice,  though,  an  asym¬ 
metric  threshold  on  innovations  is  desirable.  Large 
positive  innovations  imply  model  error  (for  chem¬ 
ical' responses  with  positive  peaks),  while  large 
negative  innovations  might  be  expected  as  state 
estimates  are  refined.  However,  several  consecu¬ 
tive,  large,  negative  innovations  may  indicate  that 
state  estimates  may  be  affected  by  the  model 
error.  In  this  situation,  it  is  necessary  to  alter  the 
covariance  matrix  P(k \k - 1),  both  to  increase 
the  uncertainty  in  state  estimates  and  to  increase 
<V  Measurements  following  this  change  are 
processed  as  before.  For  work  reported  here,  the 
absolute  threshold  was  set  to  3|<j,  |,  and  two  con¬ 
secutive  measurements  producing  innovations  be¬ 
low  3 1  o,|  caused  reset  of  me  diagonal  elements  of 
the  covariance  matrix  for  all  state  components 
contributing  more  than  5%  of  the  predicted  mea¬ 
surement.  This  selective  reset  was  done  to  avoid 
altering  state  estimates  that  were  not  likely  to  have 
been  influenced  by  the  model  error.  Calculation  of 
the  innovations  threshold  is  fast,  and  the  filtering 
is  set  to  that  most  of  the  data  processed  are 
welt-modelled.  For  these  reasons,  rapid  conver¬ 
gence  of  filter  estimates  is  usually  observed,  even 
though  the  filter  is  not  strictly  optimal,  because 


the  filter  model  is  incomplete.  External  optimiza¬ 
tion  methods  and  extensive  iteration  are  not 
needed  when  this  adaptive  filter  is  used  to  correct 
filter  models.  When  consecutive  negative  innova¬ 
tions  are  encountered,  however,  at  least  one  more 
iteration  should  be  performed  to  insure  satisfac¬ 
tory  estimates  of  states  and  covariances. 


IMPLEMENTATION 

Programs  for  CLS  calibration,  partial  least- 
squares  calibration,  principal  components  calibra¬ 
tion,  and  Kalman  filtering  were  all  developed  in 
the  MATLAB  programming  environment.  Kal¬ 
man  filtering  programs  included  the  linear  drift 
filter,  the  proportional  dnft  filter  based  on  an 
extended  Kalman  filter,  the  second-order  adaptive 
filter  based  on  covariance  matching,  and  the  in¬ 
novations  variance-based  adaptive  filter  for  detec¬ 
tion  of  model  etrors.  In  all  Kalman  filters,  the 
Kalman  algorithm  (cqs.  (A1)-(A5))  was  used.  The 
MATLAB  environment  Was  run  on  an  Apple 
Macintosh  SE  equipped  with  68020  processor,  8 
Mbytes  of  memory,  hard  disk  and  a  68882  numeric 
coprocessor.  No  effort  was  made  to  optimize  any 
of  these  programs  for  execution  speed. 

Data  for  evaluation  of  these  filter  calibration 
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W 

ns  socb  Ibzi  ibe  pat  S/A’  ratio  ms  20. 1.  Data 
sets  reclaming  drift  woe  generated  by  calculating 
coocentraiioos  using  ibe  drift  models  deftaed  by 
ccs.  (10)  and  (13).  Tbs  drift-corrupted  coccer.ua- 
rioos  were  mclupBrd  by  tbs  true  spectra,  and 
wise  was  added  to  produce  sets  of  noisy,  drift- 
corrupted  spectra. 


Sensor  channel 

2»  Essimilcd  spectra  from  CLS  and  sequential  regression.  A.  Columns  of  the  K  matrix.  from  CLS  regression  as  applied  to  20 
member  training  set.  with  random.  Gaussian  noise  added  to  obtain  a  maximum  S/A  of  ISO.  B.  Columns  o(  the  A  matrix,  from 
sequential  regression  of  the  same  20  member  calibration  set  as  above. 
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Fig.  3.  lcstnium  from  adaptive  filtering  of  loeompletct)  (noddled  data.  A.  Innovation*  from  filtering  of  25-nxrrr.bci  validation  xi 
with  a  maximum  5/  A  of  40.  and  ucmodeikd  acipooast  a*  pita  in  Fig.  1.  A  sequential  regression  calibration  »a»  used  lt>  generate 
the  adaptive  filter  model,  and  estimated  values  for  R  (1$  x  10  ’)  and  Q  fl.0  x  10  “>  were  used  in  the  filtering,  B.  Innovations 
from  filtering  of  25- member  validation  set  with  maximum  5/ A  of  2000.  and  unmodelled  component  as  given  is  Fig,  1.  Filtering  was 
done  as  in  (A),  but  with  estimated  values  foe  R  (1.1  X  10"‘)  and  Q  (1.0x  10'15)  used  for  filtering. 


TABLE  1 


Estimation  of  component  concentrations  in  absence  of  model  error  and  dnfl 


Method 

Prediction 
set  S/S  (max.) 

Calibration 

model 

R 

Q 

PRESS* 

KF 

39 

S.R.  ** 

2.6  XIO-’ 

00 

0.0907 

CLS 

39 

CLS*** 

0 

0 

0.804 

KF 

39 

CLS 

2.6  x  W’ 

■oxio-’ 

0.822 

CLS 

39 

S.R. 

0 

0 

0.00 

KF 

IS 

SR. 

1.06x10"* 

00 

0.524 

CLS 

IS 

CLS 

0 

0 

1.246 

KF 

IS 

CLS 

1.06  XI0*1 

1.0X10-’ 

1.244 

CLS 

IS 

S.R. 

0 

0 

0  524 

*  PRESS.  The  sum  of  squared  error  for  (he  predicted  spectrum  as  compared  to  the  true,  noise-free  spectrum,  summed  over  all 
calibration  components  for  the  20  members  of  the  prediction  set  The  same  prediction  set  was  used,  with  different  amounts  of  added 
noise,  in  each  case,  and  each  method  was  applied  to  each  set.  so  that  direct  comparison  is  possible. 

**  S  R..  model  from  sequential  regression  of  absorbance  onto  standard  concentrations,  using  estimated  measurement  error  variance 
of  2.6x  10"\ 

***  CLS:  model  from  classical  least  squares  regression  of  calibration  data,  without  weighting. 


■  O^su!  Rocrrfi  Pjpa 


97 


RESULTS  AND  DISCUSSION 

A  sd  of  calibration  standards  wws  prepared  b> 
Simula  tied.  Tbc  component  spectra  wrre  as  shown 
in  Fig  1.  After  calibration  bad  been  accomplished, 
prediction  was  attempted  on  simulated  validation 
sets,  for  which  the  true  values  of  component  con¬ 
centrations.  no#sc  variance,  and  drift  were  known. 

Eq-Mtalaxt  of  Kalman  fiber  and  CIS  prediction 

To  demonstrate  the  essential  equivalence  of 
CLS  prediction  and  sequential  regression,  the  two 
methods  were  compared  for  a  well-behaved  set  of 
data.  Fig.  2  shows  plots  of  the  columns  of  matrix 
K  obtained  from  sequential  regression  of  calibra¬ 
tion  spectra  onto  the  standard  concentrations,  and 
as  estimated  from  CLS  calibration  applied  to  the 
same  training  set  data.  The  noise  estimate  used  in 
the  sequential  regression  was  1.0  X 10  4.  dose  to 
the  true  noise  variance  contained  in  the  calibra¬ 
tion  data.  Improvement  in  the  estimation  obtained 
through  use  of  the  sequential  regression  is  ap¬ 
parent.  Overfilling  of  the  calibration  data  has 


been  diminished  substantially  by  use  of  the 
sequential  regression  and  the  agreement  of  the 
columns  of  K  estimated  from  sequential  regression 
with  the  true  spectra  is  excellent. 

Table  1  show's  results  from  s  timc>icdependenL 
scalar  Kalman  filter  and  CLS  prediction  applied 
to  a  typical  sets  of  validation  data,  with  noise 
taken  from  a  uniform  distribution.  In  this  study, 
the  spectral  models  were  generated  two  way's:  one 
set  was  generated  by  CLS  calibration:  these  con¬ 
tained  noise,  as  demonstrated  in  Fig.  2.  The  other 
set  was  generated  by  sequential  regression  of 
calibration  data:  these  models  were  virtually 
noise-free.  Both  models  were  used  for  Kalman 
filtering  and  for  CLS  prediction.  The  results  from 
the  time-independent  Kalman  filter  were  identical 
to  those  obtained  from  CLS  prediction  when  the 
CLS  calibration  model  was  used.  However,  if  the 
calibration  model  noise  was  treated  as  a  form  of 
system  dynamics  noise,  and  a  suitable  value  was 
usid  for  Q  (see  cq.  (A7)  in  the  Appendix)  in  the 
Kalman  filter,  improved  estimates  resulted.  In  fact, 
these  estimates  tracked  estimates  obtained  from 
filtering  with  noise-free  calibration  models. 


TABLE 2 


Estimation  of  component  concentrations  in  presence  of  unmodclled  response 


Method 

Prediction 
set  S/S  (max.) 

Calibration 

model 

R 

Q 

PRESS* 

CLS 

39 

CLS 

- 

- 

364 

CLS 

19-14 

CLS 

- 

- 

3-86 

CLS 

1944 

SR. 

- 

- 

2.67 

AKF** 

39 

SR. 

26X  I0'4 

10X10',: 

LOO 

AKF 

1944 

SR. 

2.6  X  I0*4 

1.0X10" 14 

0.13 

AKF 

39 

CLS 

2.6X10'4 

10X10'4 

2.21 

AKF 

1944 

CLS 

95X  10'4 

1.0  X10'5 

037 

PLS*** 

39 

PLS4 

- 

- 

320 

PLS 

1944 

PLS 

- 

- 

2.17 

PCR44 

39 

PCR444 

- 

- 

325 

PCR 

1944 

PCR 

- 

226 

*  Modified  to  reflect  unmodelled  component  presence.  PRESS  was  calculated  for  the  accuracy  of  prediction  of  all  modelled 
components. 

**  AKF.  Innovations  vanance-based  adaptive  filtering,  using  innovations  limit  of  3®,(A).  and  reset  for  covariance  upon  restart  of 
3  X  10*  J, 

***  PLS:  Partial  least  squares  prediction,  us  ng  the  algonlhm  given  in  Geladi  and  Kowalski  (2). 

*  PLS.  Partial  (east-squares  calibration,  with  the  model  defined  by  cross-validation  A  five-factor  model  was  used  m  these  fits 
55  PCR:  Principal  components  regression,  using  the  algorithm  given  in  Geladi  and  Kowalski  (2) 

544  PCR.  Principal  components  calibration,  with  the  model  defined  by  the  first  three  eigenvectors  of  the  calibration  d3ta  scatter 
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Calibration  ro odd  noise  is  'explained*  by  treating 
il  as  a  focm  of  dynamic  noise  in  the  sequential 
regression,  and  increasing  Q  to  a  realist^  value 
prevents  the  overfilling  of  the  noisy  riodd  to 
data.  In  general.  Kalman  filtering  of  the  predict¬ 
ion  sets  produced  results  that  were  superior  to 
those  obtained  from  CLS  prediction,  links*  CLS 
prediction  was  earned  out  with  noise-free  calibra¬ 
tion  models. 

Estimation  in  the  presence  of  unmodelled  compo¬ 
nents 

When  an  extra,  non-calibrated  component  is 
added  to  the  set  of  species  producing  the  set  of 
calibration  responses,  the  prediction  error  in¬ 
creases.  Fig.  2  shows  the  response  of  four  compo¬ 
nents.  the  calibration  model  included  concentra¬ 
tions  and  responses  on  the  last  three,  since  compo¬ 
nent  4  was  absent  from  the  calibration.  As  indi¬ 
cated  in  Table  2.  the  presence  of  this  unantic¬ 
ipated  and  uncahbraied  response  produces  a  sig¬ 
nificant  decrease  in  the  accuracy  of  estimation  of 
the  well-modelled  components.  Components  1  and 
2.  whose  responses  show  significant  overlap  with 
the  unmodelled  response  of  component  1.  carry 
the  largest  error  in  estimation.  Component  3,  with 
a  response  that  is  somewhat  separated  from  the 
unmodelled  component,  still  carries  some  error  in 
estimation.  Further,  the  error  in  estimation  is  pre¬ 
sent  despite  the  calibration  method  employed. 
Even  methods  based  on  regression  of  data  onto 
factor-based  calibration  models  arc  unable  to 
compensate  for  the  unmodelled  component  in  the 
prediction  step.  The  innovations  vanance-based 
adaptive  filter,  on  the  other  hand,  successfully 
compensates  for  most  of  the  effects  of  the  unmod- 
ellcd  component,  and  shows  significantly  less  er¬ 
ror  in  the  estimated  concentrations  of  components 
1  and  2,  and  slightly  less  error,  on  average,  in  the 
estimation  of  component  3. 

The  results  observed  for  partial  least  squares 
(PLS)  and  principal  components  regression  (PCR) 
calibration  were  strikingly  different  than  those 
observed  for  fitting  by  other  means.  PRESS  values 
obtained  from  fitting  the  calibration  model  to  the 
validation  data  set  were  almost  independent  of  the 
noise  contained  in  the  validation  data,  and  they 


were  considerably  better  than  all  but  those  pro¬ 
duced  by  adaptive  filtering.  The  results  can  be 
explained  by  noting  that  PLS  and  PCR  methods 
rdv  on  a  factor  model  for  the  multicomponent 
system.  This  factor  model  is  produced  during  the 
calibration,  and  it  is  set  up  to  remove  most  of  the 
noise  present  in  the  calibration.  The  three  PCR 
factors  (or  five  PLS  factors)  fit  validation  data 
with  or  without  noise  equally  well:  the  dominant 
error  is  model  error  from  the  uncalibratcd  compo¬ 
nent.  In  this  case,  the  model  error  is  relatively 
small  and  good  estimates  resulted  from  factor- 
based  calibration.  Residuals  from  fitting  of  the 
validation  data  showed  overfilling  of  components 
near  the  uncalibrated  component,  just  as  in  the 
CLS  fitting,  however. 

Sensitivity  of  innovations  variance  adaptive  filter  to 
noise  estimates 

As  would  be  expected  from  its  derivation,  the 
adaptive  filter  based  on  innovations  variance  is 
somewhat  sensitive  to  values  used  for  system  and 
measurement  noise.  To  obtain  the  results  sum¬ 
marized  in  the  table  above,  values  of  1  x  10  ~9  and 
2.6  X  10  '  were  used  for  Q  and  R.  respectively. 
These  values,  obtained  from  the  innovations  corre¬ 
lation  adaptive  filter  as  discussed  above,  very 
slightly  overestimated  the  actual  noise  contribu¬ 
tions,  which  were  0  and  2.5  x  10" 5  for  system  and 
measurement  noise  variances.  For  state  values  near 
unity,  the  third  term  in  eq.  (26)  will  dominate  at 
the  start,  when  the  stale  covananccs  arc  large,  but 
the  first  term  will  quickly  become  the  dominant 
term  as  state  covariance  decreases.  With  state 
values  near  unity,  the  second  term  will,  in  general, 
always  be  small  unless  system  noise  is  sizable,  this 
term  probably  could  be  neglected  to  decrease 
computational  overhead,  if  desired  After  ti  e  first 
20  points,  and  on  subsequent  filter  passes  with 
better  state  estimates  and  decreased  covariances, 
estimates  of  measurement  noise  variance  will 
dominate  the  calculation  of  filter  innovations  vari¬ 
ance,  and  will  therefore  set  the  region  where 
acceptable  innovations  will  be  found  Significant 
over-  or  underestimation  of  the  measurement  noise 
variance  may  be  expected  to  significantly  affect 
the  performance  of  the  adaptive  filter.  To  test  the 
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TABLES 

Seaseroiy  of  bemacef  vxmace  adaptive  Shrr  to  acuc  vrhact  oasulrt 

Method 

°rrdictioa 

KlSy.V(C3M.) 

CaEbradoa 

«dd 

R 

Q 

PRESS* 

AKF 

IW* 

SLR. 

lDxur' 

l-OxlO'1* 

6XW 

AKF 

1W 

SR. 

25X10“’ 

IjOX  10"“ 

IjM 

AKF 

19U** 

SLR. 

5.0  X 10“’ 

1J0X10-" 

1.74 

AKF 

19M** 

SLR. 

1.0X  10“’ 

IjOxIO-* 

1-S*. 

AKF 

IW 

SR. 

25x10“’ 

IjOxIO"’ 

1j06 

AKF 

1944  — 

SLR. 

5.0X  10“’ 

IjOX  10"’ 

1.74 

AKF 

1944  — 

SR. 

1.0x10-’ 

IjOxIO'* 

4J05 

AKF 

1944  — 

SR. 

15X10-’ 

1.0X10-’ 

1.19 

AKF 

1941  — 

SR- 

5-0X10”’ 

1.0X10-* 

1.S7 

AKF 

1944  — 

SR. 

25x10"’ 

1.0x10-’ 

1.06 

AKF 

1944  — 

SR. 

5.0X10“’ 

10X10-’ 

1.75 

AKF 

1941  — 

S.R. 

1.0x10*’ 

10x10"’ 

1.51 

*  Modified  PRESS.  See  Table  2  foe  detail*.  True  ootsc  variances  for  data  were  R  “  2.6  X  10“  \  and  Q  -  1  0  x  10"* 
**  Noise  »as  generated  from  uniform  disinbuiioci  for  this  data  set. 

***  Noise  was  generated  from  normal  distribution  for  this  data  set. 


sensitivity  of  the  filtering  to  errors  in  the  estima¬ 
tion  of  the  measurement  and  system  noise  vari¬ 
ances.  two  data  sets  were  prepared  for  examina¬ 
tion.  one  with  noise  taken  from  a  uniform  distri¬ 
bution.  and  one  with  noise  taken  from  a  normal 
distribution.  Filtering  was  done  on  these  data, 
using  the  same  filter  model,  and  the  same  initial 
guesses.  Onl>  the  guesses  for  the  system  and  mea¬ 
surement  noise  was  changed  from  run  to  run. 


Table  3  presents  the  results  of  that  study.  Fig.  3 
shows  the  innovations  after  adaptation  of  the  filter 
model  for  two  noise  levels.  The  unmodelled  fourth 
component  is  visible,  even  when  noise  in  the  mea¬ 
surement  approaches  the  size  of  the  unmodellcd 
component  (case  A)  When  noise  levels  are  small 
(case  B).  good  correction  of  the  models  for  the 
unmodelled  component  is  observed  from  the  fiat 
innovations  over  the  data  set. 


TABLE  4 

Estimation  of  noise  variance  in  data 


Model  l>pe  * 

Init  R 

Est  R 

Init  Q 

Est  Q 

True  R 

True  Q 

N 

10x10"“ 

41x10-' 

0 

11X10-” 

40X10-* 

0 

N 

10x10-' 

41x10-' 

0 

13x10-” 

40X10-' 

0 

N 

10X10-' 

41x10-* 

0 

13X10-” 

40X  10"' 

0 

N 

10X10-’ 

43X10-' 

0 

1  lx  10” 

40x10"' 

0 

D 

31X10-’ 

4  1  X  10' 

0 

1  1  X  10'” 

40x10-' 

0 

D 

10XHT' 

41x10-' 

10X10-” 

44X10“* 

40X10“* 

40X10-* 

N 

10X10“* 

41X10-' 

10X10“' 

51X10-’ 

40X10-' 

40X10”’ 

D 

10X10-' 

4.1  X  10-' 

10 

34x10-' 

40X10-' 

40  X  10'* 

A 

10x10”' 

44X10-' 

10X10-’ 

44X10-’ 

40X10-' 

40  X  10”’ 

*  N  Complete  model,  with  initial  state  guess  of  0  and  covariance  guess  of  /  No  dnfl  was  present  D  Incomplete  model  with 
random  dnft  between  measured  spectra  Initial  guesses  of  state  and  covanancc  were  as  in  case  N  above  A  Incomplete  model,  with 
random  dnfl  and  unmodelled  components  present  The  initial  slate  guess  was  within  10  a  of  true  slate  value,  and  covariance  guess  of 

/ 
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Estimation  of  noise  variance  parameters 

To  test  the  accuracy  of  estimation  of  noise 
variances  by  the  innovations  correlation  adaptive 
filter,  this  filter  was  applied  to  several  spectra  with 
different  noise  structures.  As  above,  noise  from 
uniform  and  normal  distributions  was  used,  and 
the  adaptive  filter  >  applied  to  the  data.  For 
this  study,  a  comp  <c  spectral  model  was  used, 
and  no  adaptation  o'  noise  means  was  necessary. 
Initial  guesses  for  the  noise  variances  were  typi¬ 
cally  near  zero,  and  guesses  for  the  covariance  of 


the  noise  estimates  were  taken  as  1.  Table  4  sum¬ 
marizes  the  results  from  this  study.  In  general, 
better  estimation  of  noise  variance  was  seen  when 
initial  guesses  of  noise  variance  were  lower  than 
the  true  values.  These  overly  opu>ci>iic  guesses 
converged  quickly  to  accurate  noise  estimates. 
When  noise  estimates  were  tried  that  overesti¬ 
mated  the  noise  contributions,  convergence  was 
slower,  and  the  final  estimates  were  not  as  accu¬ 
rate.  For  this  system,  estimates  of  the  measure¬ 
ment  variance  R  were  found  to  be  more  accurate 
than  those  for  the  systems  noise  Q.  but  both 
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Fig  4  A  S  muated  data  sh»»wing  linear  dnfi  in  response  and  offset  for  Dnft3  data  set  B  Estimated  concentrations  for  data 
shotting  'meat  drift  from  CLS  regression  with  offset  term.  C.  Estimated  concentrations  for  data  showing  linear  drift  from  Kalman 
niter  with  dnf:  model 
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• 

Conp.  3 

I 

• 

Offset 

TABLE  5 

Estimation  of  cc  tporent  concentrations  from  multicomponent  data  corrupted  by  linear  dnft 


Method 

Dataset  * 

Predict.cn 
set  S/N  (mas  ) 

Calibration 

model 

Rel  error  (%) 

PRESS 

Comp  1 

Comp  2 

Comp  3 

CLS 

Dnftl 

39 

CLS  ** 

186 

0  28 

-161 

28  n 

KF 

Dnftl 

19 

SR  *** 

1  85 

028 

-16! 

28  65 

CLS 

Dnftl 

39 

SP.  ** 

006 

2  80 

-0  33 

29  83 

KF 

Dnf«2 

39 

s  R  *** 

107 

-004 

-214 

009 

CLS 

Dnf:2 

39 

SR  ** 

i  :i 

-009 

-212 

0  25 

KF 

Dnft3 

39 

SR  4t* 

0  86 

0  75 

-4  07 

0  57 

CLS 

Dnft3 

39 

SR  ** 

167 

-006 

-194 

060 

KF 

D.-ifO  5 

39 

SR  *** 

096 

057 

-063 

018 

*  Data  sets  had  lineai  dnft  in  all  parameters,  including  the  instrumental  ie»ponse  functions  for  each  component  and  in  the  offset 
term  Drill  noise  winces  were  1  Ox  10  *  for  dnft  in  all  instrumental  response  parameters,  and  1  0  a  10-4  in  the  offset  term 
Drifu  noise  variances  were  1  0  x  Id  *  in  all  instrumental  response  parameters  and  offset  terms,  in  which  the  exception  of  the 
response  function  tor  component  1,  which  had  a  dnft  vanance  of  1  0  a  iO  1  Dnft3  h«d  nose  variances  foi  all  instrumental  response 
valuables  of  1 0  x  10  7,  and  dnft  in  the  offset  term  with  vanance  1.0  x  10  *  All  data  sets  had  added  measuiement  noise,  with  a 
vanance  of  2  5  x  10" 5 

**  degression  model  augmented  to  include  an  offset  term 

*  **  Filtu  model  augmented  to  include  an  offset  term  and  drift  paiamele.s  in  icsponves  and  offset  Duplicate  measurements  made  at 
each  point, 

*  Four  replicates  were  used  m  the  measurement  s<ep  for  this  run 


ice 


Chcroosnetrics  and  fnidligcm  Lab-xa  tory  Sywcms  ■ 


estimated  values  were  close  enough  to  ihe  true  C3  b  dx  d2  d3  db J.  where  b  is  the  offset  term  in  the 
noise  variances  to  be  useful  in  filtering  data.  filter  measurement  model,  and  dt  describes  drift 

PRESS  'dues  were  generally  lower  for  data  with  in  state  /.  Definition  of  the  state  >n  this  way  meant 

normally-distributed  noise  than  for  data  with  uni-  that  eight  parameters  were  fitted  to  the  muliicom- 
i'orm  noise,  as  might  be  expected  from  the  dcriva-  ponent  data.  Duplicate  data  were  used  to  fit  to 

lion  of  the  adaptive  filter.  ensure  filter  observability,  as  discussed  above. 

Data  with  linear  drift  in  response  and  offset  were 
Correction  of  drift  in  multicomponent  prediction  generated  to  test  this  filter.  A  typical  data  set  is 

show'n  in  Fig.  4A. 

Correction  for  linear  drift  was  briefly  studied  Results  from  this  study  are  summarized  in  Ta- 

for  multicomponent  prediction.  For  the  three  blc  5.  Filtering  results  are  not  significantly  better 

component  chemical  system  used  in  these  studies,  than  those  obtained  from  CLS  regression  with  an 

the  state  vector  used  for  filtering  was  X  =  (C,  offset  term  in  the  calibration  model,  but  filtering 
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Fig  5  A  Simulated  data  showing  proportional  dnfl  in  response  and  offset  for  Dnft4  data  set  B  Estimated  concentrations  for  data 
showing  proportional  dnft  from  CLS  regression  with  offset  term  C  Estimated  concentrations  for  data  showing  proportional  dnft 
from  extended  Kalman  filter  with  dnft  model 


typically  produced  lower  PRESS  values  than  CLS 
regression.  Fig.  4B  and  C  clarify  the  advantage 
gained  by  the  use  of  the  much  more  complex  filter 
model  over  the  simpler  CLS  model,  significantly 
reduced  fluctuations  m  the  estimation  of  con¬ 
centrations  from  a  drift-corrupted  prediction  set. 


It  is  clear  from  the  figure  that,  when  properly 
modelled,  linear  dnft  can  be  effectively  removed 
from  multicomponent  systems  Part  of  the  failure 
to  achieve  better  drift  correction  with  the  filter  lies 
in  the  weak  observability  of  the  filter  drift  model 
With  single  measurements  made  on  a  drifting  sys- 


TABLE  6 


Estimation  of  component  concentrations  from  multicomponent  data  corrupted  by  proportional  dnft 


Method 

Data  set  * 

Prediction 
set  S/N  (max ) 

"alibration 

model 

Rel.  error  (%) 

Comp  1 

Comp.  2 

Comp  3 

PRESS 

EKF  ** 

Dnft4 

39 

S.R.  *** 

315 

-107 

-3  80 

009 

CLS 

Dnft4 

39 

SR. 

14  8 

13  3 

10  57 

18  35 

CLS 

Dnft4 

39 

SR  • 

14  5 

131 

9  78 

18  39 

EKF 

Dnft4 

39 

SR.” 

3  80 

-0  34 

-310 

009 

EKF 

Dnfl5 

39 

SR  ” 

1.39 

0  31 

0  51 

0  20 

CLS 

DnftS 

39 

SR.* 

379 

2  76 

3.19 

1.16 

EKF 

Dnft5 

39 

SR.*** 

1.12 

003 

024 

019 

Data  sets  had  proportional  dnft  in  the  response  parameters  for  all  components,  along  with  random  drift  in  the  offset  Dnft4  had 
proportional  dnft  of  0  9905,  and  random  dnft  vanancc  of  1 0  x  10  8  Dnft5  had  proportional  drift  of  1  0016,  and  random  dnft  of 
1.0  x  10"*.  Both  sets  had  measurements  noise  with  vanancc  2.5  x  10'*, 


**  Extended  Kalman  filter,  with  filter  models  as  defined  by  eqs.  (14)  and  (15).  Filter  state  included  component  concentrations  and 
proportional  dnft  parameter. 

***  Using  correct  concentrations  as  the  initial  guess,  with  guessed  proportional  dnft  of  1 
*  Using  augmented  regression  mod-».  with  offset  term. 

M  Using  zero  as  initial  guess  of  component  concentrations,  and  guessed  proportional  dnft  of  1. 
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tern,  the  filter  drift  model  is  not  observable,  and 
estimates  of  component  concentrations  fluctuate 
wildly  Use  of  duplicate  measurements  insures  that 
the  filter  model  is  observable.  It  is  only  weakly 
observable,  however,  and  the  estimated  compo¬ 
nent  concentrations  are  not  as  stable  as  in  other 
filtering  applications  reported  here.  Collecting 
even  more  replicates  improves  filter  results,  since 
this  has  the  effect  of  making  the  filter  model  more 
observable.  The  size  of  the  drift  also  has  an  effect 
on  the  precision  of  the  concentration  estimates.  As 
drift  magnitude  increases,  Q  also  must  increase, 
and  as  indicated  m  eq  (A5),  the  covariance  in  the 
final  filter  states  must  also  increase  Thus,  even 
though  linear  drift  can  be  corrected  and  its  effects 
removed  with  sufficient  replicates,  the  precision  of 
the  predictions  is  degraded. 

The  correction  of  proportional  drift  by  use  of 
an  extended  Kalman  filter  was  also  investigated. 
The  same  spectral  models  and  calibration  was 
used  as  in  the  other  studies  reported  above,  and 
proportional  drift  was  introduced  into  the  set  of 
spectral  data  to  be  used  for  prediction.  A  typical 
data  set  is  shown  in  Fig.  5A.  The  extended  Kal¬ 
man  filter  was  applied  to  data  with  proportional 
drift,  using  the  filter  model  described  in  cqs.  (15) 
and  (17)  above.  Linearization  of  the  system  dy¬ 
namics  was  done  as  in  eq.  (18).  As  before,  esti¬ 
mates  of  noise  variance  were  obtained  from  the 
innovations  correlation  adaptive  filter.  Table  6 
shows  results  of  fitting  data  with  proportional 
drift. 

Correction  of  proportional  drift  is  better  than 
correction  of  linear  drift,  but  unlike  the  results 
obtained  from  the  linear  drift  study,  the  initial 
guess  for  the  states  used  m  filtering  is  important  in 
convergence  of  drift  estimates.  Even  with  poor 
initial  guesses,  though,  very  good  compensation  of 
drift  occurs  as  is  apparent  from  the  error  in  the 
estimated  results  and  the  PRESS  values.  As  with 
linear  drift,  correction  of  proportional  drift  results 
in  degraded  precision  in  the  estimated  component 
concentrations. 

CONCLUSIONS 

This  work  has  demonstrated  the  ease  with  which 
a  CLS  calibration  may  be  accomplished  by  Kal¬ 


man  filtering.  This  approach  to  CLS  calibration 
and  prediction  provides  for  improved  calibration 
models  and  improved  prediction  accuracy  in  noisy 
data.  Another  benefit  is  the  ability  to  reject  un- 
modellcd  component  responses  in  the  prediction 
of  analyte  concentrations  in  unknown  mixtures. 
The  correction  of  drift  in  the  response  of  the 
chemical  system  during  the  prediction  step  is  also 
possible,  provided  that  a  suitable  model  for  the 
dnft  process  can  be  generated.  All  these  correc¬ 
tions  represent  relatively  simple  additions  to  the 
calibration  model  The  classical  least  squares 
calibration  model  is  especially  convenient  because 
the  Kalman  filter  has  been  derived  for  use  with  a 
causal  measurement  model  Given  the  general  def¬ 
inition  of  the  filter  state,  and  the  possibility  of 
extending  sequential  regression  to  the  inverse 
model,  however,  there  is  no  difficulty  in  extending 
the  time-senes  concepts  of  Kalman  filtering  to 
other  calibration  models,  at  least  on  an  ad  hoc 
basis 

All  filtering  routines  presented  here  are  rela¬ 
tively  fast.  These  could  be  realized  in  a  suitable 
real-time  language,  if  desired,  for  on-line  use. 
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APPENDIX 

Scalar  Kalman  algorithm 

Propagati<  >n  of  filter  states  in  time 

X(k)  “  F(A*)X(/;  -  1)  (Al) 

Propagation  of  state  covariance  in  time 

P(* -  1)  -  F(K)V(k  -  1 1*  -  l)Fr(&) 

+  Q  (*)  (A2) 

Kalman  gam 

K(*)-P(*|*—  l)H(A)[nr(A)P(A'|A  -  1) 

XII(A) +  «(*)]*'  (A3) 
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State  update 

X(*|  A)  «  X(A'|A  -  1)  +  K(*)[z(Jt)  -  HT(k) 

XX(Jt|Jfc  — I)]  (A4) 

Covariance  update 

P(A | A)  -  P(k\k- 1)  -  K(A)Hr(A)P(A  | k  -  1) 

(A5) 

Initial  guesses 

X(0|0)  =  XoP(0|0)  =  Po  (A6) 

Noise  assumptions 

E|e(*)«>r(*))  =  R(*) 

£[»•(*)»• T(  k  ))■=<}(*) 

£■  [ »•■( A- ) y )]  =0  forall  j,  k  (A7) 
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Abstract 


Bates  DM  and  Watts  DG.  1991  Model  budding  in  chemistry  using  profile  i  and  trace  plots  Chemomeirus  and  Intelligent 
laboratory  Systems,  10  107-116 

The  aim  of  model  budding  is  to  d.termine  the  correct  model,  which  means  that  the  equation  describing  ihe  phenomenon  undei 
stud)  includes  all  the  important  factoi  x  in  the  correct  form,  and  excludes  unimportant  faclois  Piaclicall),  of  couist,  we  can  only  use 
the  data  at  hand  to  fit  a  model  which  is  adequate'  In  linear  and  nonlinear  regression,  a  model  which  is  inadequate  because  an 
important  factor  is  not  included,  oi  because  a  factoi  is  incorporated  in  a  wrong  form,  can  often  be  detected  by  examining  plots  of  the 
residuals  And  in  linear  regression,  models  which  include  too  many  factors  or  too  many  parameters  an  often  be  detected  by 
examining  the  parameter  correlation  rna'nx.  or  the  parameter  estimates  and  then  standard  erro.^  Foi  nuiilineai  models,  howevei, 
such  linear  approximation  summaries  ate  not  reliable  To  aid  in  the  development  of  nonlineai  models,  we  lecommend  using  prof  le 
likelihood  plots  The  plots  arc  simple  to  generate  and  appea'  to  be  especially  useful  in  delecting  models  which  could  be  simplified  by 
removing  factors  or  by  equating  parameters  In  this  paper  we  jsedata  sets  from  chemical  engineering  loillustia'e  the  value  of  profile 
t  and  profile  trace  plots  in  model  building 


INTRODUCTION 

linear  regression 

Consider  a  set  of  data  consisting  of  values  of  a 
set  of  factors,  x„pt  n  «  1,  2....,  N,  />“  1, 2,...,  P , 
and  the  corresponding  values  of  a  response,  y„ , 
which  are  well  described  by  a  model  of  the  form 

Y,“PiX*\  +  +  ■■■  +/ipx.,,+  Z„  (1) 


where  Yn  is  a  random  variable  corresponding  to 
the  observation  for  the  /rth  case,  and  Zn  is  a 
random  variable  corresponding  to  the  ‘noise’  in¬ 
fecting  that  case.  The  noise  for  each  case  is  as¬ 
sumed  to  be  normally  distributed  with  mean  0  and 
variance  o2.  and  independent  from  case  to  case 
The  model  for  all  N  cases  can  be  written  in  matrix 
form  as 

Y«Xp  +  Z  (2) 
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where  Y  is  the  iV  X  1  vector  of  random  variable* 
representing  the  responses.  X  is  the  '»  ;<  P  daiva- 
live  matrix,  p  is  the  /*  X  I  vector  of  unknown 
parameter  values,  and  Z  is  the  vector  of  random 
variables  representing  the  noise.  The  quantity  Xp 
is  called  the  expected  value  of  Y  and  the  model  is 
termed  linear  because  the  derivative  of  the  ex¬ 
pected  value  with  respect  to  any  parameter  does 
not  depend  on  any  of  the  parameters  (1J. 

For  a  linear  model  of  the  form  (2)  with  nor¬ 
mally  distributed  noise,  classical  statistical  analy¬ 
sis  (see.  for  example  ref.  2)  establishes  that  the 
‘best'  estimate  of  p.  given  data  y.  can  be  formally 
written  as 


table i 


Saiod^xit  Off  '»ni»3S  tot  «>6CN  x 


-0215 

-0451 

o.co 

©2* 

ft*? 

e-.uo 

QMt 

e2to 

IW 

Michael  addition 


p  ~(X7\y\Ty  (3) 

where  p  =  (/?,.  fiz . 0r)T.  is  the  least  squares 

estimate  Furthermore  the  associated  estimator  can 
be  shown  to  have  the  properties  that  it  is  normally 
distributed  with  expected  value  P  and  variance  - 
covanancc  matrix  (XrX)  _,o2.  The  pth  parameter 
thus  has  estimated  standard  error 


*-(/U“*v{<xrx> 'L  <4> 

where  s 2  =  S(p)/(  A'  -  P)  is  the  variance  estimate 
given  by  the  minimum  sum  of  squares  divided  b> 
its  ‘degrees  of  freedom’.  A'  -  P.  A  1  -  a  confi¬ 
dence  interval  for  that  parameter  is  given  by 

a/2)  <6( /?,)«/(*-/>;  a/2)  (5) 

where 


«(/*„)  = 


Pr-K 

se{Pp) 


(6) 


is  the  studcnlized  parameter  and  f(iV  —  P\  a/2)  is 
the  value  which  isolates  an  area  a/2  under  the 
right  tail  of  the  Student's  t  distribution  with  A P 
degrees  of  freedom.  A  (1  —  a)  joint  parameter 
inference  region  for  the  parameters  is  given  by 

(p-p)rXrX(p-p)</>j2F(/>,  N-P-,  o) 

(7) 


where  F(P>  N  —  P,  a)  is  the  value  which  isolates 
an  area  a  under  the  right  tail  of  Fisher's  F  distri¬ 
bution  with  P  and  N  -  P  degrees  of  freedom 


Gross  and  Hoz  (3)  obtained  data  on  the  relative 
reaction  rate  of  the  addition  of  CN  to  a  series  of 
l.l-diaryl-2-nitroelhyfenes  for  which  the  linear 
model 

Ym  =  0^0:xm-Z. 

is  appropriate.  In  cq.  (8J.  )„  rebles  to  the  natural 
logarithm  of  the  relative  rate  constant.  IiHA/A,,). 
and  x  to  the  substituent  constant.  o„.  The  data  are 
listed  in  Table  1.  The  row  0.0  corresponds  to 
hydrogen. 

For  these  data,  the  least  squares  estimates  arc 
P  - (  -0.031. 4.13}r  with  parameter  standard  er¬ 
rors  0.036. 0.19  respectively.  The  variance  estimate 
is  s2  —  0.00474  with  five  degrees  of  freedom,  and 
we  have 


xr x  =  j  7.00000  0.91800 1 
‘  =  \0.91S00  0.25252/ 


(xrxr 


|  0.27302  -  0.992541 

‘  I  -0.99254  7.56S37  I 


Joint  confidence  regions  are  ellipses,  as  shown  in 
Fig.  1. 


Nonlinear  regression 

Now  consider  a  model  of  the  form 

)•.=/(«.  *,)  +  Z.  (9) 

where  /  is  a  nonlinear  expectation  function.  x„  is 
a  vector  of  regressor  variables  for  the  #ilh  case, 
and  0  =  $f)T  is  a  Pxl  vector  of  un¬ 

known  parameters.  (We  use  0  to  emphasize  the 
difference  from  linear  models.  As  before,  the  dis- 
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lurbances  Zm  arc  assumed  to  be  normal  (0.  o2) 
and  independent.  A  model  /(0.  x,)  is  nonlinear  if 
ai  least  one  derivative  of  the  expectation  function 
with  respect  to  at  least  one  of  the  parameters 
involves  at  least  one  of  the  parameters  [1].)  For 
example,  the  Michadis-Menten  model  for  enzyme 
kinetics.  /  =  0, x/l02  -r  x)  is  deemed  nonlinear  be¬ 
cause  the  derivative  3 f/d0t  =  x/{8.  +  x )  involves 
02. 

Unlike  the  linear  model,  eqn.  (1).  there  arc  no 
analytic  results  for  the  estimates  and  their  distri¬ 
butions.  Indeed,  there  is  not  even  an  explicit  solu¬ 
tion  for  the  least  squares  estimates;  to  find  the 
least  squares  estimates,  we  must  resort  to  search 
or  iterative  techniques.  Properties  of  the  estimates 
are  usually  assumed  to  be  well  represented  by 
linear  approximations  evaluated  at  the  least 
squares  estimates  8.  for  example,  the  linear  ap¬ 
proximation  vanancc-covariance  matrix  is  taken 
to  be  (VrV)‘  ls\  where  s2  =  S(0)/(jV  -  P)  is  the 
variance  estimate.  V  =  3t)/30r  is  the  derivative 
matrix  evaluated  at  9,  and  ti(6)  =  (/(0, 

/(0,  xm))T  is  the  vector  of  function  values  evaluated 
at  the  design  points. 


7b:  Emt  appcoxzsatieci  aiafanl  error  for 
toe  funactcf  8*  is.  by  analogy  with  cq.  *4J. 

*(»/»«*-,  {iVrV»‘sjw  (10) 


10J  a  fear  apcf*\iasMioQ  |1  -  a)  aaipia}  coa- 
r«kxt  mural  B.  by  analogy  wii  eq. :  5J. 

-xt.V-  P:  a/T)  <«(«,!</(  -V  -  P.  a/'l)  (11) 


x.bcrc  Ok  j&xkauznl  ^smarter  i-  dcfoc-J  by 
analogy  vilh  eq.  (() 


«(*,)  = 


«(*,» 


<::* 


Finally,  a  lice,  approximation  <1  -  a)  joint 
parameter  inference  region  for  the  parameters  is 
taken  to  be 


(8-8)'v'V(9-9 HPs:FiP.  S-P.  a)  (1?) 

Unfortunately.  linear  approximation  inference 
regions  arc  not  trustworthy  |1J. 


Profile  t  plots 

Because  the  sampling  theory  approach  is  not 
adequate  for  nonlinear  regression,  it  is  necessary 
to  use  a  more  general  approach  based  on  the 
likelihood  function.  Fortunately,  for  noise  which 
is  independently  normally  distributed  with  con¬ 
stant  variance,  the  likelihood  function  depends 
primarily  on  the  sum  of  squares  function 

S(8)  =  (y—  t,(8))'(j— ti(8»  (14) 

and  so  drawing  inferences  about  the  parameters 
reduces  to  summarizing  the  sum  of  squares  func¬ 
tion  efficiently  and  meaningfully. 

For  linear  models  of  ihe  form  of  eq.  (2).  the 
sum  of  squares  function  is  quadratic  in  3  and  so 
contours  of  constant  likelihood,  which  correspond 
to  contours  of  constant  relative  plausibility  of 
parameter  values,  are  concentric  ellipsoids.  For 
example,  the  elliptical  confidence  regions  in  Fig.  1 
are  also  sum  of  squares  contours.  To  summarize 
the  likelihood  function  for  a  linear  model  then,  we 
only  need  to  specify  the  (common)  center  of  the 
ellipsoids  (?).  and  their  size  and  onentation.  This 
can  be  done  mathematically  for  any  number  of 
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parameters.  aod  essentially  reduces  10  ike 
summary  cq.  (7).  but  vis aoEzbg  ibe  joint  region  in 
rooce  than  three  dimensions  is  very  difficult. 

For  noohnear  roodds.  the  sum  of  squares 
surface  is  noi  quadratic,  and  so  the  problem  be¬ 
comes  occ  of  interpreting  or  nmafinos  3  com¬ 
plicated  surface  in  mnlti^c  dimension!  To  do 
this,  wc  focus  on  the  characteristics  of  the  sum  of 
squares  function  when  viewed  in  one  or  two  di¬ 
mensions 

A  useful  view-  in  one  dimension  is  given  by  its 
•shadow'  in  that  dimension;  the  profile  sura  of 
squares  function.  For  a  model  of  the  form  (2)  or 
(9).  the  profile  sum  of  squares  function  for  the  pth 
parameter  can  be  written 

S(Bf)  -  minS((0,.  8f,/)  =s((ff,.  OT,)1-) 

05) 


aad  is.  in  general.  no!  equal  to  the  Modernized 
parameter  function.  The  profile  r  function  is  simi- 
lar  to  the  J  statistic  used  by  Bliss  and  James  [4J. 

The  profile  i  function  is  \aluable  because  plots 
of  the  profile  /  function  versus  the  studentized 
profile  parameter  provide  useful  information  on 
the  nonlinearity  of  an  estimation  situation.  This  is 
because,  for  a  linear  model,  a  plot  of  t(/5„)  versus 
the  studenuzed  profile  parameter  S(fir)  is  a 
straight  line  through  the  origin  with  unit  slope. 
Departures  of  the  profile  /  plot  of  ?( Br)  versus 
HBr)  from  the  45  degree  line  reveal  nonlinearity 
in  the  parameter,  and  determining  where  r(6  ) 
intersects  the  horizontal  line  at  height  ±r(AT- 
P.  a/2 )  dclermines  an  accurate  nominal  (1  -  a) 
likelihood  intetval  for  Bf. 

Profile  traces 


where  the  trace  vector 

. 1 . 06) 

is  the  least  squares  estimate  of  0  p  conditional  on 
ihe  profile  parameter  8r.  The  notation  (Br.  0r  ) 
denotes  the  vector  with  elements  '  ' 

<*< . . sr) 

For  a  linear  model,  ihe  profile  sum  of  squares 
function  is  a  parabola,  and  can  be  written  in  terms 
of  the  studentized  parameter  as 

a?) 

By  rearranging  this  equation,  we  can  write 
Hfip)  -  s,gn{fif  -  fi,){S(jtp)  -  S(P)  /s 

ST(^e)  (18) 

where  t(^)  is  the  profile  /  function.  That  is,  for  a 
linear  model,  the  profile  /  function  is  identically 
equal  to  the  studentized  parameter  function. 

For  a  nonlinear  model,  the  profile  /  function  is 
defined  as 

r(V  "  *<&>(»,  S,)^s(er)-s(i)  /s 


Additional  important  information  can  be  ob¬ 
tained  from  pairwise  plots  of  the  components  of 
the  trace  vector  versus  the  profile  parameter.  That 
is.  we  overlay  plots  of  »(»,)  versus  6 'and  B  (0  ) 
versus  Br 

For  a  linear  model,  a  plot  of  the  trace  £,(/*,) 
versus  ft,  will  be  a  straight  line  through  the  origin 
with  slope  given  by  Ihe  correlation  between  Ihe 
parameters  (derived  from  the  appropriate  element 
of  the  matrix  (XrX)_l).  Furthermore,  the  traces 
will  intersect  the  parameter  joint  confidence  el¬ 
lipses  at  points  of  horizontal  or  vertical  tangcncy 
of  the  ellipses.  (See  Fig.  1  or  a  plot  of  Ihe  profile 
traces  for  the  CN  Michael  addtttoi  data.) 

For  a  nonlinear  modci.  the  traces  w- 11  be  curved, 
but  will  still  intersect  the  parair.ete:  joint  likeli¬ 
hood  regions  at  points  of  vertical  and  horizontal 
tangcncy.  This  information,  together  with  infor¬ 
mation  from  the  profile  r  plots,  can  be  used  to 
obtain  accurate  sketches  of  the  joint  regions,  as 
described  in  ref.  1,  Appendix  6.  The  traces  and 
sketches  reveal  useful  information  about  the  inter¬ 
dependence  of  the  parameter  estimates  caused  by 
the  form  of  Ihe  model  for  the  expectation  function 
and  by  the  experimental  design  used  in  the  investi¬ 
gation.  Such  information  can  provide  valuable  in¬ 
sights  for  inference,  for  model  building,  and  for 
design,  as  demonstrated  in  the  next  section. 
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CODIMER  HYDROGENATION 


Tschemitz  ct  a!.  [5]  obtained  and  analyzed  data 
on  the  vapor  phase  hydrogenation  of  mixed  isooc- 
tenes  over  a  solid  supported  nickel  catalyst  in  a 
study  to  determine  the  most  plausible  mechanism 
for  a  reaction.  The  data  consisted  of  the  average 
reaction  temperature  T  (Kelvin),  the  average  par¬ 
tial  pressures  of  hydrogen  (*,).  of  codimer  (.x2). 
and  of  hydrogenated  codimer  (x?).  (atmospheres), 
and  the  reaction  rate  (Ib/(h)  (lb  catalyst)). 

Eighteen  mechanisms  were  postulated  for  the 
reaction,  and  the  most  plausible  one  is  found  to  be 
that  in  which  the  reaction  between  molecularly 
adsorbed  hydrogen  and  adsorbed  codimer  is  con¬ 
trolled  b.  the  surface  reaction,  so  the  reaction  rate 
is 


$1$J9xX1X2 

( 1  t  0zx  |  +  0,x2  +  ^,Xj)2 


(ref.  5.  model  d).  The  parameters  0,.  03.  and  04 
represent  adsorption  equilibrium  constants  and  0, 
is  the  product  of  the  adsorption  velocity  constant 
of  hydrogen  and  codimer  molecules  xjL,  where 
sL  represents  the  ‘activity’  of  the  catalyst.  The 
parameter  0,  also  has  the  interpretation  of  the 
proportion  of  the  surface  area  of  the  catalyst 
which  is  covered  by  the  reactants. 

It  is  assumed  that  each  of  the  constants  can  be 
expressed  as  a  function  of  temperature  by  means 
of  an  Arrhenius  relation. 


0,  =  exp  {a,/R  +  P,/RT) 


TABLE  2 

Parameter  summary  for  codimer  h>drogenaiion  data,  model  d 


Param¬ 

eter 

Est 

St 

Correlation 

Ox 

$2 

0t 

-016 

035 

*2 

-2  78 

037 

-095 

-108 

031 

-088 

081 

0 4 

-297 

087 

-019 

0  30 

Y» 

-266 

1.79 

-041 

035 

Y: 

6.38 

184 

0.19 

—0  14 

Yj 

54! 

122 

014 

002 

Y4 

17  75 

248 

-003 

-003 
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where  R  «  1.986  is  the  gas  law  constant,  a  is  the 
effective  entropy  change,  and  fi  is  the  negative 
effective  enthalpy  change  under  the  assumption 
that  the  catalyst  activity  remains  unaltered  with 
change  in  temperature. 

The  linear  summary  statistics  for  model  d,  using 
the  data  at  all  temperatures  and  the  Arrhenius 
form  for  the  velocity  and  equilibrium  constants, 
are  shown  in  Table  2.  To  improve  the  behavior  of 
the  estimates,  we  scaled  and  centered  the  data 
using  Xq  =  lOOOfl/T—  1/548).  and.  to  avoid  con¬ 
fusion.  define  «  at  -f  /J,/548  and  y,  =  /SyiOOO. 
Inspection  of  the  parameter  estimates  and  their 
standard  errors  in  Tabic  2  suggests  that  and  y, 
could  be  zero,  that  and  ^  could  be  equal,  and 
that  y3  and  y2  could  be  equal.  However,  we  must 
be  careful  about  incorporating  model  reductions 
which  involve  the  4>s  since  they  depend  on  the 
arbitrary  centering  temperature  T0  and  the  associ¬ 
ated  y. 

To  demonstrate  the  kinds  of  information  which 
is  available  from  profiling,  we  present  selected 
profile  trace  plots  in  Fig.  2  and  discuss  various 
aspects  of  the  plots.  Superimposed  on  the  trace 
plots  are  sketches  (dashed  and  solid  closed  curves) 
of  the  joint  likelihood  regions.  The  horizontal  and 
vertical  tangents  of  contours  which  are  incom¬ 
pletely  determined  arc  shown  by  short  bars  on  the 
traces.  Also  shown  is  the  straight  line  (solid  or 
dashed)  corresponding  to  equal  values  of  the 
parameters  with  the  X  indicating  the  point  at 
which  both  parameters  equal  zero 

From  Fig.  2a  it  can  be  seen  that  could  be 
zero,  since  the  point  corresponding  to  =  0  lies 
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well  within  the  vertical  ‘shadow’  of  the  joint  likeli¬ 
hood  region  and  has  a  studentized  value  near  zero. 
However.  <k  =  0  is  not  plausible,  nor  is  <>,  =<>,. 


Similarly,  from  Fig.  2c.  y,  =  0  is  eminently  plausi¬ 
ble,  as  is  Yi  =  $4.  The  latter  :s  meaningless,  how¬ 
ever,  since  the  two  parameters  arc  of  different 


(*)  £(&)  (b)  £(<fe) 


Fig  2  Selected  profile  traces  for  codimer  hydrogenation,  model  d  (a)  <>2  vs  <>,,  (b)  vs.  <>2.  (c)  yy/s  $j,  (d)  ys  vs  y2,  (e)  y,  vs  <>4, 
<0  vs  <*4  The  solid  -ind  dashed  closed  curves  denote  the  60.  80,  90.  95,  and  99%  joint  likelihood  boundaries  The  solid  or  dashed 
straight  line  is  the  line  of  equality  of  two  parameters,  and  the  x  indicates  the  point  corresponding  to  Qp>  0q  =  0, 0  Short  vertical  and 
horizontal  bars  on  the  traces  show  the  boundanes  of  contours  which  are  not  completely  determined 
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types.  Comparing  Fig.  2b  and  d.  it  can  be  seen 
that  <Jj  =  is  not  plausible,  but  that  y,  =  y2  is 
highly  plausible. 

The  high  parameter  correlations  between  the 
parameters  manifest  themselves  as  long  ridges  in 
the  8-dimensional  inference  region,  as  illustrated 
in  Figs  2a  and  f.  where  only  the  60%  contour  is 
complete.  High  parameter  correlations  often  indi¬ 
cate  overparametnzation,  but  it  appears  that  over- 
parametnzation  can  also  manifest  as  large  joint 
likelihood  regions,  especially  unclosed  ones,  as 
shown  in  Fig.  2e,  where  there  is  negligible  correla¬ 
tion.  but  the  joint  region  is  very  large.  This  indi¬ 
cates  a  subspace  in  which  the  sum  of  squares 
surface  is  very  flat,  which  could  be  due  to  over- 
paramctrization 

Because  the  <>s  depend  on  the  arbitrary  center¬ 
ing  temperature  T0.  we  first  considered  model 
reductions  involving  only  the  ys  We  refitted  the 
mode],  first  holding  y,  at  zero,  and  still  found  that 
<>,  =  0  was  plausible,  as  was  yj  =  y2  Setting  ys  =  y2 
and  y,  =  0  gave  the  results  shown  in  Table  3. 

The  residual  sum  of  squares  went  from  2  2456 
x  10' 4  with  32  degrees  of  freedom  to  2  3543  X 
10  4  with  34  degrees  of  freedom,  so  there  is  not  a 
statistically  significant  extra  sum  of  squares  At 
this  point  we  noted  that  two  response  values  gave 
rise  to  large  sludentized  residuals  and  so  these 
rows  were  deleted  and  the  model  refitted.  The 
main  effect  of  this  was  to  reduce  the  residual  sum 
of  squares  by  about  30%  and  to  reduce  the  param¬ 
eter  standard  errors  by  about  15%. 

Because  y,  =  0,  it  is  legitimate  to  follow  through 
and  set  =  0  as  well  The  results  from  this  model, 
using  the  edited  data,  are  shown  m  Table  4  The 

TABLE  3 


Parameter  summary  for  codimer  hydrogenation  data,  model  d 
■Mlh  yi  -  0  and  -  y. 


ParaTi-  Esti- 

eter  mate 

St 

Correlation 

$t 

*2 

9a 

$4 

Yi 

-027 

on 

^2 

-271 

0  36 

-096 

•>> 

-096 

0  29 

-0  86 

082 

<>4 

-289 

085 

-019 

030 

0  51 

*2 

377 

062 

-036 

043 

057 

042 

Y* 

17  78 

240 

000 

-007 

-022 

-0  85 

oos 

TABLE -t 


Parameter  summary  for  edited  cod'mer  hydrogenation  data, 
model  d  with  0,  -  0.  y,  -  0.  and  y,  -  y2 


Pa- 

ram- 

eter 

Esti-  st 
mate 

9956 

Likelihood 

lower  upper 

Correlation 

*2 

Yz 

<> 2 

-303  0  09 

-3  28  -277 

<>s 

-1  13  013 

-148  -  070 

-012 

«4 

-314  079 

-690  -124 

0  3S 

067 

Yi 

363  049 

2  25  5  12 

0  28 

054 

0  38 

Y4 

1751  2  32 

1196  28  39 

-020 

-045 

-087  0  07 

profile  trace  plots  (selected  examples  of  which  are 
shown  in  Fig  3)  still  show  considerable  nonlinear¬ 
ity  in  the  model-data  set-paramelrization  combi¬ 
nation.  Parameters  <t>4  and  y4  are  the  worst  af¬ 
fected.  both  individually  and  jointly,  as  shown  by 
the  strongly  curved  profile  r  plots  The  asymptotic 
behavior  in  the  profile  t  plots  causes  the  joint 
likelihood  regions  to  be  open  at  levels  above  90% 
Although  the  line  <i>2  =  ft,  passes  through  the  center 
of  the  joint  likelihood  region.  Fig  3a.  it  makes  no 
sense  to  equate  these  parameters  because  they 
depend  on  the  centering  temperature  and  the  y 
parameters  We  conclude,  therefore,  that  the  sim¬ 
plest  form  of  model  d  has  been  obtained 

It  is  useful  to  note  that  of  the  ten  trace  pair 
plots  only  three  (#,  vs  <j>2,  y2  vs  i>2,  and  y2  vs  <t>s) 
gave  closed  contours  at  the  95  and  99%  levels 
Since  the  model  has  been  pared  to  a  sensible 
minimum  number  of  parameters,  this  suggest  that 
impiovement  in  the  behavior  of  the  likelihood 
surface  could  only  be  achieved  by  incorporating 
more  data.  From  the  remaining  seven  trace  plots, 
and  from  the  profile  r  plots,  fig  3e  and  f,  it  is 
clear  that  the  open  contours  are  due  to  lack  of 
information  on  <j>4  and  y4.  (In  Fig  3e  the  profile  r 
approaches  an  asymptote  as  reduces,  and  in 
Fig  3f,  the  profile  t  approaches  an  asymptote  as 
y4  increases.  In  Figs  3a.  b  and  c,  the  contours  are 
open  because  of  and  m  Figs  3b  and  d.  because 
of  y..  The  parameters  <p2,  <(■,,  and  y2  are  all  well 
behaved  in  these  plots  and  in  the  other  trace  pair 
plots.)  A  future  design  should  therefore  be  con¬ 
structed  so  as  to  provide  more  information  on 
and  y4  both  individually  and  jointly,  possibly  using 


t(4>  i)  s(4>  3)  6{<p  2) 
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M  «>  5(7l) 

Fig  3  Selected  profile  /  and  trace  plots  for  codimer  h)drogcnation.  edited  data.  moJel  d  with  $1  ~  0,  y,  -  0,  and  y,  -  >2  (a)  ft  vs 
ft.  (b)  ft  vs  y4,  (cj  ft  vs  ft,  (d)y2vs  ft,  (c)  profile  1  fot  ft.  (0  profile  r  for  y4  In  (a)(d),  the  solid  and  dashed  closed  curves  denote 
the  60,  80.  and  901  joint  livelihood  boundaries  and  the  short  bars  on  the  traces  indicate  horizontal  and  vertical  tangents  of  the  95 
and  99?  contours  which  are  incompletely  determined  The  dashed  straight  line  is  the  line  of  equality  of  two  parameters,  and  the  x, 
when  present,  indicates  the  point  corresponding  to  0p>  ~  0. 0  In  (c)  and  tf).  the  so'id  line  is  the  profile  1  function  and  the  dashed 

line  is  the  linear  reference  Dotted  fines  show  nominal  60. 80, 90,  95.  and  99%  likelihood  intervals.  The  x  is  the  point  corresponding 
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subset  designs  as  proposed  by  Box  (6)  and  Hill 
and  Hunter  (7). 


DISCUSSION 

The  profile  plot  approach  to  summarizing  the 
inferential  results  of  a  statistical  analysis  has  much 
to  recommend  it.  The  computations  for  the  profile 
t  and  profile  trace  plots  arc  very  efficient  because 
we  start  from  excellent  estimates  based  on  the 
previous  calculation,  and  because  the  problem  is 
of  reduced  dimension  (/*-!)  Also,  at  each  value 
of  the  profile  parameter  we  simultaneously  gener¬ 
ate  the  profile  t  value  and  the  converged  values  of 
the  trace  vector,  which  provides  the  data  to  make 
the  profile  pair  plots.  And  for  all  the  calculations, 
only  minor  modifications  to  standard  software  are 
necessary. 

Profile  plots  provide  important  detailed  infor¬ 
mation  about  the  estimation  situation.  In  addition 
to  providing  accurate  marginal  likelihood  regions 
for  each  parameter,  the  profile  /  plots  reveal  how 
nonlinear  each  parameter  is  Similarly,  the  profile 
trace  plots  and  the  associated  likelihood  contour 
sketches  provide  useful  information  on  the  pair- 
wise  behavior  of  the  parameters.  Superimposing 
the  line  of  equality  on  the  trace  plots  is  a  simple 
but  extremely  effective  aid  to  model  building. 
Perhaps  more  importantly,  however,  the  plots  col¬ 
lective  provide  insights  into  the  experimental 
situation,  so  that  steps  can  be  taken  to  obtain 
more  informative  data  (8). 

Ratkowsky  (9)  has  suggested  rewriting  rational 
model  functions,  such  as  in  the  codimcr  model,  by 
factonng  the  numerator  parameters  into  the  de¬ 
nominator  term.  For  example  model  d  would  be¬ 
come 

rm _ *1*2 _ 

(A  +  A*i +  A*2 +  A*3)2 

where  /?,  *  A  “!  and  so  on- 

Profile  plots  for  the  fis  arc  much  better  behaved 
than  those  for  the  9s,  producing  almost  perfectly 
straight  profile  t  plots  and  traces.  One  consc- 
o4uencc  of  this  is  that  marginal  and  joint  linear 


approximation  regions  and  summaries  for  the  0 
parameters,  are  extremely  accurate. 

This  illustrates  a  situation  where  linear  ap¬ 
proximation  inferences  for  one  set  of  parameters 
for  a  nonlinear  regression  model  are  much  more 
accurate  than  for  another  set  of  parameters.  How¬ 
ever,  the  ease  with  which  profile  t  and  profile 
trace  plots  can  be  produced  renders  reparametn* 
zation  considerably  less  important,  since  accurate 
marginal  and  joint  likelihood  regions  can  be  ob¬ 
tained  directly  for  the  original  parameters,  which 
are  usually  more  meaningful  to  the  researcher. 

For  univariate  reparametrization,  say  $  = 
g(9p),  the  profile  t  plot  and  associated  profile 
traces  for  <*>p  can  be  obtained  directly  from  the 
profile  t  plot  and  associated  profile  traces  for  9p 
there  is  no  need  to  reparametrize  the  model  func¬ 
tion  or  do  any  rcestimation.  This,  of  course,  is  a 
consequence  of  invariance  of  the  likelihood  func¬ 
tion 

Profiling  provides  extiemely  valuable  informa¬ 
tion  for  experimental  design,  as  demonstrated  in 
the  codimcr  hydrogenation  example.  There  it  was 
clearly  evident  from  the  profile  t  and  trace  plots 
that  further  data  was  required  to  provide  better 
information  about  <f>4  and  y4.  No  such  indication 
was  evident  from  the  linear  summary  statistics. 

Finally,  profiling  can  be  applied  to  very  general 
situations,  including  multiresponsc  estimation,  as 
wc  have  shown,  and  both  univariate  and  multi¬ 
variate  time  series  analysis  The  univariate  situa¬ 
tion  has  been  discussed  by  Lam  and  Watts  (10). 
One  can  also  use  profiling  to  determine  likelihood 
intervals  for  fitted  values  of  the  model  function, 
by  reparametnzing  the  model  so  that  a  new 
parameter,  say  :s  equal  to  the  fitted  value  at  a 
specified  design  point 
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Abstract 


bcn-Avraham,  D.  1991  Diffusion  in  disordered  media  Chemornetnes  and  Intelligent  Laboratory  Systems,  10  117-122 

Diffusion  in  disoidered  media  is  anomalous  m  that  ti  does  not  follow  the  reguiai  Ftckian  law  of  diffusion  in  homogeneous  systems 
This  has  impoilant  implications  foi  the  physics  of  tiansporl  phenomena  in  disordered  media  Fractals  and  scaling  theory  have  been 
particularly  illuminating  in  (his  area  of  research  An  elementary  exposition  of  anomalous  diffusion  m  disordered  media  and  its 
physical  consequences,  based  on  the  concept  of  fractals,  are  presented 


INTRODUCTION 

Diffusion  is  among  the  most  common  phenom¬ 
ena  in  nature.  One  would  find  it  relatively  easy  to 
provide  with  several  examples  of  systems  where 
diffusion  plays  a  decisive  role,  in  most  areas  of 
scientific  research.  In  homogeneous,  ordered  media 
diffusion  obeys  Fick’s  law, 

<*2>cc/  (1) 

i.c.,  the  mean  square  displacement  of  a  diffusing 
particle  increases  proportionally  to  the  time.  This 
basic  remit  is  universal  in  that  it  applies  whether 
diffusion  takes  place  in  one.  two,  or  any  dimen¬ 
sion  of  regular  Euclidean  space  [1).  We  have  be¬ 
come  so  much  accustomed  with  this  universality 
that  the  realization  that  Fick's  law  is  violated  for 
diffusion  in  disordered  media  came  as  a  big 
surprise.  In  non  homogeneous,  disordered  systems 
the  diffusion  law  becomes  anomalous  (2,31, 

(R2)  a  t2/t*m  (2) 


with  r/w  >  2  This  slowing  down  of  the  transport  is 
caused  by  the  delay  of  the  diffusing  particles  in 
the  dangling  ends,  bottlenecks  and  backbends  ex¬ 
isting  in  the  disordered  structure. 

The  concepts  of  fractals  and  fractal  dimen¬ 
sionality  have  helped  us  understand  better  than 
ever  before  the  physics  of  disordered  systems  such 
as  porous  earth,  powders,  amorphous  materials, 
and  aggregates.  In  this  brief  overview,  we  explain 
these  concepts  and  how  fractals  are  used  to  model 
disordered  systems.  We  then  show  how  diffusion 
is  anomalous  in  disordered  media  and  point  at 
some  of  the  physical  consequences  of  this  re¬ 
markable  irregularity. 

FRACTALS  AND  DISORDERED  MEDIA 

We  begin  with  the  definitions  of  the  most  basic 
properties  of  fractals  (4).  Fractals  arc  mathemati¬ 
cal  objects  with  a  Hausdorff-Besicovitch  dimen¬ 
sion  that  is  not  an  integer.  They  are  most  easily 
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constructed  m  a  recursive  way.  Thus,  for  example, 
the  Koch  curve  (Fig.  1)  is  constructed  by  starting 
with  a  unit  segment.  The  middle  third  section  of 
this  segment  is  erased  and  replaced  by  two  other 
segments  of  equal  length  1/3.  Next,  the  same 
procedure  is  repeated  for  each  of  the  four  result¬ 
ing  segments  (of  length  1/3).  This  process  is 
iterated  ad  infinitum.  The  limiting  curve  is  of 
infinite  length  yet  it  is  confined  to  a  finite  region 
of  the  plane.  The  best  way  to  characterize  it  is  by 
using  its  Hausdorff-Besicovitch  or  fractal  dimen¬ 
sion,  d(.  In  a  Koch  curve  magnified  by  a  factor  of 
three  there  fit  exactly  four  of  the  original  curves. 
Therefore  its  fractal  dimension  is  given  by  3J‘ «  4, 
or  dt  m  in  4/ln  3  =  1.262.  The  fractal  dimension  is 
a  generalization  of  the  integer  dimensions  that  we 
associate  with  regular  objects  of  classical  Euclidean 
geometry. 

An  important  property  of  fractals  which  ren¬ 
ders  them  particularly  useful  for  the  modelling  of 
disordered  media  is  their  self-similarity.  This  can 
be  seen  by  examining  the  Koch  curve  or  the  Koch 
snowflake,  as  it  is  frequently  called.  One  can  sec  a 
central  object  reminiscent  of  a  snowman.  To  the 
right  and  to  the  left  of  this  central  snowman,  there 
are  two  other  snowmen,  each  being  an  exact  re¬ 
production  only  smaller  by  a  factor  of  1/3.  Each 
of  the  smaller  snowmen  has  in  turn  two  still 


smaller  copies  of  themselves  to  their  right  and  left, 
etc. 

In  recent  years,  it  has  become  clear  that  many 
disordered  systems  are  best  characterized  by  a 
symmetry  of  invariance  under  dilatation  (5).  This 
fundamental  symmetry  is  essentially  the  same  as 
the  self-similarity  of  fractals,  only  that  disordered 
systems  occurring  m  nature  exhibit  this  self-simi¬ 
larity  only  in  a  statistical  sense.  For  these  objects  a 
fractal  dimension  dt  is  still  easily  defined  by  the 
scaling  of  their  mass  A/  with  their  linear  size  L 

M  cc  Ld*  (3) 

The  Koch  curve  can  serve  as  a  model  for  a 
linear  polymer  chain.  Likewise,  the  Sierpinski 
sponge  of  Fig.  2  is  an  obvious  model  for  porous 
media.  It  is  constructed  by  subdividing  a  cube  into 
3  x  3  X  3  =  27  smaller  cubes,  and  eliminating  the 
central  small  cube  and  its  six  nearest  neighbors. 
Each  of  the  remaining  20  cubes  is  processed  in  the 
same  way  and  the  whole  procedure  is  iterated 
indefinitely.  Notice  that  the  volume  of  the  sponge 
is  zero,  while  its  surface  area  is  infinite.  This 
agrees  intuitively  with  the  fact  that  its  fractal 
dimension  d,  =  In  20/ln  3  *  2.727  lies  in  between 
d  =  2  and  d  =  3  Fractals  have  been  used  to  model 
an  immense  variety  of  disordered  systems.  Nature 


(o) 


F ig.  i.  Koch  curve,  (a)  the  iterative  process  by  which  n  is constructed,  (b>  sdf-stmilaniy  -  the  central  snowman  is  surroundered  by 
two  exact  copies  of  itself. 
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Fig  2  The  Sieipmski  sponge 


abounds  with  examples  of  self-similar  objects.  This 
has  been  made  clear  by  several  excellent  boohs 
published  in  recent  years  which  helped  popularize 
fractals. 


ANOMALOUS  DIFFUSION 

It  is  convenient  to  refer  to  a  simple  random 
walk  as  a  model  for  diffusion  (1).  In  a  simple 
discrete  random  walk  the  walker  advances  one 
step  in  a  unit  time.  Each  step  is  taken  with  equal 
probabilities  to  any  of  the  nearest  neighbors  of  the 
present  site.  Denote  the  steps  of  such  a  walker  by 

u,,  uj . Then,  the  mean  square  displacement 

at  time  /.  <fi2(r)>.  is  given  by 

<*’(')>“  ||  £  “<j  j-f  +  2£<»r«,>  <4) 

For  regular  lattices  the  correlations  (u,  •  u,)  are  all 
zero.  Thus,  in  homogeneous  systems  one  has  the 
usual  result  for  normal  diffusion  that  (R2(0)  - 1. 
Disordered  systems  are  charactenzed  by  irregular 
lattices.  The  nearest  neighbors  of  a  site  arc  not 


distributed  symmetrically  and  the  correlations  («, 

•  U/>  are  not  zero.  This  may  lead  to  anomalous 
diffusion. 

Interestingly,  a  random  walk  itself  is  a  statisti¬ 
cally  self-similar  object.  To  see  this,  consider  the 
random  walk  as  it  looks  when  one  regards  n 
consecutive  steps  as  one  single  ‘superstep ,  Each 
of  the  supersteps  is  a  random  jump  r  on  the 
lattice.  The  random  supcrslcps  are  distributed 
according  to  some  probability  Pf( e)  wath  a  finite 
moment  (r2)  -  n.  In  the  limit  n  »  1,  P.(r)  tends 
to  a  Gaussian  distribution.  This  is  a  simple  result 
of  the  central  limit  theorem.  It  is  evident  that 
statistically  the  same  random  walk  results  for  dif¬ 
ferent  values  of  n.  The  only  difference  between 
walks  with  /i m  ny  and  n  “  n2  is  that  in  the  first 
case  a  step  is  performed  every  n,  time  units  and 
every  /ij  time  units  in  the  latter.  Also,  the  average 
length  of  a  step  is  n’,/2  and  n’/2  respectively,  for 
the  different  walks.  This  means  that  if  we  scale 
time  as  /  —  Al  and  length  as  r  —  Al/2r  then  two 
walks  with  <i,  -  Ad;  are  exactly  equivalent  under 
this  scaling.  Hence,  the  simple  random  walk  is 
statistically  self-similar.  In  fact  it  is  a  statistical 
fractal.  Upon  dilation  of  space  by  a  factor  of  A*  , 
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Fig  3  A  Sicrpinski  gasket  drawn  to  the  sixth  generation 


the  number  of  slcps  (or  ‘mass*  of  the  walk)  in¬ 
creases  by  a  factor  of  A.  Therefore  the  fractal 
dimension  of  a  random  walk  is  </w«=ln  A/ln 
A1/2  ■=  2.  It  is  interesting  that  random  walks  per¬ 
formed  on  disordered,  but  statistically  self-similar 
structures  are  still  $elf-$inu!ar  themselves,  exactly 
as  for  regular  lattices.  The  important  difference  is 
that  the  usual  diffusion  exponent.  dw  ■  2,  is  no 
longer  equal  to  2.  Diffusion  is  anomalous. 

We  will  now  illustrate  anomalous  diffusion  by 
considering  a  random  walk  on  the  Sicrpinski  gasket 
of  Fig.  3.  The  Sicrpinski  gasket  is  perhaps  the 
most  widely  used  fractal  lattice  for  theoretical 
applications.  This  is  because  of  the  fact  that  it  is  a 
finitely  ramified  fractal,  i.e.,  one  needs  cut  only  a 
finite  number  of  bonds  to  isolate  a  subset  of  the 
gasket.  This  property  facilitates  the  exact  analysis 
of  various  physical  models,  including  the  random 
walk  problem. 

At  each  step  the  walker  chooses  randomly  to 
move  to  one  of  the  four  nearest  neighbors  on  i!«c 
gasket.  As  stated  above,  we  expect  the  walk  to  be 
statistically  self-similar.  The  mean  square  dis¬ 
placement  would  grow  as  (R2)  a  t2/\  where  </w 
is  the  anomalous  diffusion  exponent.  Note  that  dw 
is  in  fact  the  fractal  dimensionality  of  the  path  of 


the  random  walker  on  the  lattice  In  Fig.  4  we 
show  a  plot  of  In  (r)  against  In  {(R2)  as  obtained 
from  an  exact  enumeration  of  all  possible  walks 
The  slope  of  the  resulting  curve  is  d w  «=  2.32  ± 


slope  is  </w -2.32*001 


t 


i 
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Fig  5  Rescaling  of  first  passage  time  for  exiting  the  gasket 
The  walker  enters  the  gasket  at  the  top  vertex  and  (a)  takes  a 
time  T  to  exit  through  the  lower  O-vertices.  (b)  The  rescued 
gasket,  T—T'  and  A  and  B  are  exit  times  from  the  internal 
(decimated)  vertices  to  the  lower  (Meriices 


001.  This  shows  clearly  the  anomaly  of  diffusion 
on  fractal  lattices. 

One  can  exploit  the  finite  ramification  of  the 
Sierpinski  gasket  to  obtain  an  exact  value  of  the 
exponent  </w  in  the  following  way  Consider  the 
mean  first  passage  time  T  to  traverse  a  gasket  unit 
from  one  of  its  vertices  to  either  of  the  remaining 
two  vertices  O  (Fig  5a)  One  can  then  calculate 
the  corresponding  mean  first  passage  time  T*  for 
exiting  a  rescaled  gasket  unit  by  a  factor  of  2  (Fig. 
Sb)  This  is  done  by  making  use  of  the  Markov 
property  of  the  random  walk.  Thus,  T'  equals  the 
time  T  to  exit  the  first  gasket  unit,  plus  A ,  the 
mean  first  passage  time  to  leave  the  rescaled  unit 
from  then  on.  Using  the  same  reasoning  for  the 
times  A  and  B  (the  mean  exit  times  starting  from 
the  decimated  internal  vertices),  one  has 

T'~T  +  A 

4A  =  47'+ A  +  B  +  T'  (5) 

4B  =  4T+2A 

The  solution  is  V  =  5T  (and  A  =4 T,  B  «  3 T), 
which  is  the  rescaling  of  time  for  a  diffusion 
process  on  the  gasket  upon  the  rescaling  of  length 
by  a  factor  of  2.  Hence,  </w  » In  5/ln  2  «  2  322 
Notice  the  agreement  with  the  result  obtained 
from  exact  enumeration.  This  anomalous  diffusion 
is  characteristic  of  all  fractal  lattices,  as  well  as  of 
statistically  self-similar  objects  such  as  percolation 
cluster  and  aggregates  (6). 


ANOMALOUS  TRANSPORT  PHYSICS 

Diffusion  is  closely  related  to  transport  physics. 
Anomalous  diffusion  results  in  anomalous  trans¬ 
port  physics  An  excellent  example  is  the  relation 
between  diffusion  and  conductivity  of  a  medium. 
In  homogeneous  systems  it  is  given  by  the  Ein¬ 
stein  relation 

e2n  ^ 

(O 

where  adc  is  the  d.c.  conductivity,  n  is  the  earner 
density  and  D  is  the  diffusion  constant 

D  ®  (R2)/t  t  1  (7) 

The  carrier  density  n  is  proportional  to  the  mass 
density  of  the  bulk.  For  fractal  substrata,  this 
scales  as  n  a  Rd,~d.  The  conductivity  exponent  ft 
is  defined  by  its  scaling  with  the  linear  size  /?, 
Oj,.  a  R~*  From  eqs.  (2)  and  (7),  D  oc  /*/*«- 1  and 
using  it  m  the  Einstein  relation  of  eq  (6)  we  get 
t  a  R2~d+fi+d<  Comparing  this  to  eq  (2)  we  ob¬ 
tain  the  relation 

d^  =  2  —  d  +  dt  +  p  (8) 

This  is  to  be  compared  to  the  classical  conductiv¬ 
ity  exponent  /t«0  of  homogeneous  media  (for 
which  dt  =  d  and  </w  «=  2).  showing  the  anomalous 
conductivity  that  results  because  of  anomalous 
diffusion. 

A  more  fundamental  consequence  of  anoma¬ 
lous  diffusion  arises  when  one  looks  at  the  density 
of  states  m  a  disordered  substrate.  The  density  of 
states  is  relevant  for  any  physical  phenomenon 
which  is  described  by  an  equation  of  motion  that 
contains  the  operator  v 2  This  includes,  for  exam¬ 
ple,  electromagnetic,  elastic,  and  quantum  phe¬ 
nomena.  The  density  of  states  p(e)  is  related  to 
diffusion  through  P(0j),  the  probability  of  a 
walker  to  be  back  at  the  origin  at  time  t : 

F(0,r)  «  f  p(<)  exp(-cf)  dc  (9) 

Jo 

By  the  time  /,  a  random  walker  has  visited  the 
sites  within  a  volume  R(t)dfcc  /d,/d*.  Therefore 
the  probability  of  returning  to  the  origin  scales  as 
1/Rd'<x  t~d,/d*.  Using  this  result  in  eq  (9),  one 
finds 

p(i)atd,/d'i~l « c‘/»/2“1  (10) 


122 


dcxofina  isJ  UKoixySnam  ■ 


where  d%  is  ihe  spectral  [7J,  *  r  fracton  [S]  dimen¬ 
sionality  for  the  density  of  states.  This  is  similar  to 
the  usual  expression  for  homogeneous  media.  p(«) 
cc  c"-- *,  except  that  d  is  replaced  by  the  anoma¬ 
lous  d,  =  2df/dm. 

As  a  final  example  of  the  physical  conse¬ 
quences  of  anomalous  diffusion  uie  would  like  to 
mention  diffusion-reaction  systems  in  contrived 
geometries.  It  is  well  known  that  the  reaction  rate 
in  diffusion-limited  reactions  is  proportional  to 
the  volume  covered  by  a  diffusing  reactant  par¬ 
ticle  per  unit  time  (this  is  known  as  the  Wiener 
sausage  problem).  Oearly.  this  is  critically  af¬ 
fected  by  the  irregularities  of  diffusion  when  the 
substrate  is  a  statistical  fractal.  This  intriguing 
topic  is  discussed  in  detail  in  the  paper  by  Kopel- 
man  ct  al.  (9J. 


SUMMARY 

Wc  have  presented  an  elementary  discussion  of 
the  basic  properties  of  fractals  and  how  fractals 
are  useful  for  the  modelling  of  disordered  media. 
Diffusion  in  disordered  media  was  shown  to  be 
anomalous  in  that  rather  han  following  Fick’s  law 
(R2)  cc  /.  it  obeys  (R2)  a  t2/J\  where  dw  is  the 
anomalous  diffusion  exponent  and  is  dependent 
upon  the  specific  characteristics  of  the  substrate  in 
question.  We  then  discussed  some  of  the  dramatic 
consequences  of  anomalous  diffusion,  as  mani¬ 


fested  in  bulk  conductivity,  the  density  of  states, 
and  reaction  rales  in  diffusion  reaction  systems. 
The  interested  reader  is  referred  to  more  complete 
review's  and  to  the  specialized  literature  of  the 
fidd. 
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Discussion  of  “Diffusion  in  disordered  media"  by 
Daniel  ben-Avraham 

Goxgc  H.  Webs 

AVxk aS  lngittsa  of  UtaSsh.  Bcvbads.  MD  2SS92  fL'S-Aj 


Professor  bea-Avraham.  in  his  lucid  article,  has 
indicated  some  of  the  simplest  characterizations  of 
transport  in  a  disordered  medium.  What  makes 
the  general  analysis  of  such  problems  so  difficult 
is  that  the  characteristic  function  cannot  easily  be 
used  to  generate  explicit  representations  of  the 
solution  to  problems  in  which  the  medium  is  not 
troralationally  invariant.  Nevertheless,  because 
phenomena  related  to  disordered  media  arise  nat¬ 
urally  in  a  variety  of  scientific  fields  the  general 
area  of  diffusion  in  such  media  has  become  one  of 
central  interest  in  contemporary  chemistry, 
mathematics,  and  physics.  A  sampling  of  some  of 
the  many  applications  of  the  theory  is  to  be  found 
in  the  a  review  by  Alexander  el  al.  [1],  a  proceed¬ 
ings  of  a  meeting  edited  by  Klafter  ct  al.  [2J. 
Excellent  more  comprehensive  reviews  of  the  sub¬ 
ject  have  been  given  by  Haus  and  Kehr  (3J.  and  by 
Havlin  and  ben-Avraham  [4]. 

Since  one  cannol.  in  general,  find  solutions  to 
the  equations  describing  transport  in  a  disordered 
medium,  how  does  one  go  about  calculating  some 
of  the  properties  of  anomalous  diffusion?  Natu¬ 
rally,  in  a  field  which  lias  been  so  widely  studied, 
a  great  many  theoretical  techniques  have  been 
tried,  most  of  which  lead  to  approximations  to  a 
solution  rather  than  explicit  solutions.  While  a 
precise  definition  of  the  term  “explicit  solution” 
may  contain  some  ambiguity,  the  only  nontrivial 
model  of  a  disordered  medium  for  which  al!  of  the 
interesting  transport  properties  are  basically- 
known  is  one  originally  suggested  by  Sinai  [5].  The 
exact  solution  is  due  to  Kesten  (6].  Sinai's  model  is 
that  of  a  random  walk  on  a  one-dimensional  lattice 


ia  which,  on  a  given  step,  the  random  walker  can 
move  from  site  « to  t  +  1  with  probability  p,  or  to 
i  —  l  with  probability  1  —  pt.  The  pt  are  assumed 
to  be  independent,  identically  distributed  random 
variables  which  satisfy  the  conditions 


£fin 


p. 


i  i  -p.i 


=  0. 


£  ltf 


<>.  i, 


t  i  ~p,  i 


(1) 


Let  A*  be  the  location  of  the  random  walker  at 
step  n.  Kesten  shows  that  the  random  variable 
arXjWn  c  iverges  in  distribution,  and  finds  an 
explicit  representation  of  the  distribution  as  an 
infinite  series.  Unlike  the  examples  cited  by  ben- 
Avraham.  the  mean-squared  displacement  of  the 
random  walk  satisfies 


£{-v;} 

In4/! 


—  constant 


(2) 


as  n  -*  oc.  There  are  many  obvious  generalizations 
of  this  model  for  which  one  would  want  to  see  a 
solution,  but  for  which  there  arc  no  exact  results 
available  cither  in  the  literature  of  mathematics  or 
physics.  For  example,  it  would  be  most  desirable 
to  extend  these  results  for  the  Sinai  model  to 
analogous  random  walks  in  higher  dimensions,  in 
addition  to  removing  the  restrictions  on  possible 
steps  of  the  random  walks  that  they  be  to  nearest 
neighbors  only.  Another  useful  generalization  is 
that  of  obtaining  an  exact  solution  of  the  corre¬ 
sponding  first  passage  problems  for  such  random 
walks  in  the  presence  of  absorbing  boundaries. 
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Related  material  b>  Solomon  for  the  one-dimen¬ 
sional  case  has  been  presented  in  the  mathemati¬ 
cal  literature  |7J,  and  a  more  heuristic  approach 
has  been  talent  in  the  physics  literature  to  find  the 
asymptotic  survival  probability  for  a  Sinai  random 
wall;  on  a  finite  line  bounded  by  traps  at  either 
end  [SJ.  Clearly,  it  would  be  most  useful  to  have 
further  models  for  transport  in  a  random  environ* 
ment  that  can  be  solved  exact!},  if  only  because 
most  analyses  of  such  problems  that  have  ap¬ 
peared  arc  approximate  and  one  always  likes  to 
have  a  benchmark  for  purposes  of  comparison. 

In  the  absence  of  general  methods  for  solving 
problems  of  transport  in  disordered  media  investi¬ 
gators  have  resorted  to  a  large  number  of  both 
approximate  (which  start  from  a  rigorous  formula¬ 
tion  of  the  dynamics)  and  heuristic  techniques 
which  enable  one  to  understand  the  dynamics  of 
such  processes.  We  will  mention  just  two  of  these 
because  of  their  popularity.  although  not  ncccs- 
saniy  their  accuracy,  in  any  given  problem.  The 
first  rather  general  method  goes  under  the  heading 
of  the  effective  medium  approximation,  although 
there  are  many  variants  in  the  literature.  To  see 
the  basic  ideas  behind  this  method  in  the  context 
of  a  grossly  simplified  model,  let  us  consider  a 
lattice  random  walk  on  a  line  in  which  the  random 
walker  moves  in  one  direction  only,  which  wc 
choose  to  be  the  positive  x  direction.  Let  k,  be  the 
rale  constant  for  the  random  walker  to  move  from 
/  to  #  +  1.  and  assume  that  the  random  walker  is 
initially  at  /  =  0.  We  will  assume  that  the  A,  arc 
identical!)  distributed  independent  random  varia¬ 
bles.  Let  p„U)  be  the  probability  that  the  random 
walker  is  at  a  at  ume  /.  These  probabilities  satisfy 
the  equations 

A>(')  =  -KpAO 

pA>)  =  -*„/>„(').  »  =  1.2.3.... 

(3) 

While  these  equations  arc  readily  solved  exactly,  I 
will  use  them  to  illustrate  the  basic  ideas  behind 
the  effective  medium  approximation  as  well  as  a 
number  of  related  techniques  which  have  been 
used  in  solid  state  physics.  In  the  context  of  (he 
present  problem,  one  assumes  that  there  are  a  set 
of  probabilities,  {</„(/) },  which  approximate  to 


the  solution  to  cq.  (3).  These  are  taken  to  be  the 
solution  to  the  coupled  set  of  equations 

dr 

f41 

«,(')=  r*.'(/-T)[9„,(T)-9.(T)]  dr. 
n  =  1.2.3,... 

Thus,  the  Markovian  equations  in  eq.  (3)  are  to  be 
replaced  by  the  coupled  set  of  non-Markovian 
equations  in  eq.  (4)  in  terms  of  an  as  yet  unde¬ 
termined  kernel.  A'(r).  What  wc  observe  in  the 
formulation  of  eq.  (4)  is  that  the  approximating 
random  walk  takes  place  on  a  line  whose  proper¬ 
ties  are  translationall)  invariant,  rhe  crucial  step 
in  the  cfffr  ;tive  medium  approximation  is  a  tech¬ 
nique  for  calculating  the  kernel  A(/>  in  terms  of 
properties  of  the  k, 

A  formal  solution  to  eq.  (4)  is  readily  found. 
Introduce  the  Laplace  transforms  g„(s)  and  K[s) 
by 

«,(*>  =  /  di 

(5) 

K(s)  ®  f  c'*'A'(/)  d/ 

Jo 

One  readily  verifies  that  the  Laplace  transform  of 
the  solution  to  the  system  of  equations  in  eq.  (4)  is 

<?„(*)  =  K’(s)/\s*  A(i)]'’*1  (6) 

In  order  to  find  an  expression  for  K{s)  we 
replace  the  original  formulation  given  in  eq  (3)  by 
a  model  m  which  only  a  single  rate  is  random  (it 
doesn’t  matter  which  one)  while  the  remainder  of 
the  medium  is  regarded  as  having  the  properties  of 
the  effective  medium  defined  in  eq  (4)  Let  A,  be 
the  single  random  rale  constant,  and  let  p„  ,(/)  be 
the  probability  in  this  modified  model,  the  ran¬ 
dom  walker  is  at  n  at  time  t.  The  satisfy 

the  set  of  equation  in  eq.  (4)  with  the  exception  of 
the  indices  j  and  j  +  I  for  which  the  equations 
become 

P,A‘)=  j'^(‘-Op,-\  AO  ^-^,P,  AO 

(7) 

p,+\ AO  =  *,/>,  AO  -  f'oK0  ~  Op,,  .AO  dT 
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The  Laplace  transform  of  the  kernel  A'(/)  is  then 
found  from  the  solution  to  the  transform  of  the  set 
of  equations  in  cq.  (7)  by  requiring  that  the  expec¬ 
tation  of  the  solution  to  the  modified  system  be 
equal  to  the  solution  for  the  state  probabilities  in 
the  effective  medium,  i  e.: 

?„</)  =  £{ /’,.„(')}  (8) 
A  solution  for  the  Laplace  transforms  pM(s)  and 
P,*i  «.(*)  is  readily  calculated  from  the  combina¬ 
tion  of  eqs.  (4)  and  (7)  to  be 


P/,(s)  = 


(M 


k,Hs) 


(i  +  A;)[i  +  /f(i)j' 


(9) 


On  making  use  of  the  requirement  in  eq.  (8)  we 
find  that  K(s)  is  the  solution  to 


1 _ 

s  +  K(s) 


E\s  +  k) 


(10) 


where  we  ha\e  omitted  the  subsenpt  on  k  because 
of  our  assumption  that  the  random  rate  constants 
are  identically  distributed.  It  is  easy  to  confirm 
that  the  </„($)  can  be  expressed  as 


which  implies  that  the  crucial  quantity  for  our 
model  is  the  expectation  E[k/(s  +  A)]  or,  equiv¬ 
alently.  E[\/(s  +  k)) 

In  the  present  completely  tnvial  model  it  is 
possible  to  show  that  eq.  (II)  is  equivalent  to  the 
result  found  bv  taking  the  expectation  of  the  exact 
solution  of  eq.  (3).  This  solution  is 


Pn(s)~ 


^0*1^2  ‘  *  *  ^/» 


(*  +  *o)(J  +  *l)  •  ’  (S  +  kn) 


(12) 


The  identification  of  exact  and  approximate  solu¬ 
tions  is  not  readily  demonstrated  for  more  general 
models,  and  in  fact  the  solution  to  the  analogue  of 
eq  (10)  generally  requires  the  solution  to  a  tran¬ 
scendental  equation  (3)  What  this  means  in  prac¬ 
tice  is  that  one  is  practically  limited  to  the  calcula¬ 
tion  of  properties  in  the  limit  s-»0,  or  equiv¬ 
alently,  in  the  limit  /  -*  oo  A  discussion  of  the 


errors  incurred  in  the  use  of  the  effective  medium 
approximation  in  the  context  of  a  simple  one-di¬ 
mensional  example  is  contained  in  the  review  by 
Haus  and  Kehr  [3J.  One  of  the  attractive  features 
of  the  effective  medium  approximation  is  that  it  is 
no  harder  to  treat  problems  in  three  dimensions 
than  it  is  for  one-dimensional  problems  and  the 
accuracy  of  the  approximation  generally  increases 
as  the  number  of  dimensions  increases.  This  is  not 
true  for  a  number  of  other  techniques  that  have 
been  applied  to  this  general  class  of  problems 
(eg,  the  renormalization  group  approach  sug¬ 
gested  in  ref.  9  which  is  restricted  to  one  dimen¬ 
sion  only). 

Finally,  we  mention  a  complete  phenomeno¬ 
logical  approach  that  has  been  successfully  ap¬ 
plied  to  problems  of  the  transport  of  carriers  in 
amorphous  semiconductors  (10.11},  as  well  as  to 
models  for  chromatographic  kinetics  (12)  In  the 
first  of  these  applications  the  transport  is  gener¬ 
ally  non-diffusive,  while  in  the  second  it  may  or 
may  not  be  diffusive.  The  model  on  which  the 
analyses  are  based  is  known  as  the  continuous-time 
random  walk  (CTRW)  in  the  hteiuture  of  physics 
and  physical  chemistry  (13,14}  This  class  of  mod¬ 
els  is  based  on  the  simplest  picture  of  a  random 
walk  in  which  the  displacement  on  a  given  step 
and  the  time  between  successive  steps  of  the  walk 
are  both  assumed  to  be  identically  distributed 
independent  random  variables  The  space  and  time 
variables  are  often  assumed  to  be  uncorrelated  so 
that  the  probability  (or  probability  density)  for  the 
displacement  r,  that  follows  an  interstep  time  t 
can  be  written  in  factorized  form  as 

/(<■•')  =P(')H‘)  (13) 

Only  in  the  case  in  which  i £(/)  =  &  exp (~kt)  is 
the  resulting  process  Markoftian  However,  it  is 
known  that  provided  that  the  first  moment  of 
t^(/)  is  finite  and  the  variance-covariance  matrix 
for  the  displacement  consists  of  finite  elements, 
the  asymptotic  properties  of  the  random  walk  in 
an  infinite  medium  will  be  those  calculated  by 
means  of  the  central  limit  theorem,  which  is  equiv¬ 
alent  to  ordinary  diffusion  (14). 

The  principal  idea  put  suggested  by  Scher  and 
Lax  is  that  a  detailed  description  of  randomness 
in  a  medium  can  be  replaced  by  the  randomness 
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inherent  in  the  pausing  time  density  i£(/)  appear¬ 
ing  in  eq.  (13).  This  ansatz  cannot  be  justified  in 
any  detailed  way  although  it  has  been  justified  in 
the  calculation  of  one  quantity  of  physical  interest 
by  Klafter  and  Silbey  [15]),  but  the  consequences 
of  the  theoretical  development  have  been  shown  in 
a  number  of  studies,  to  yield  results  in  good 
agreement  with  experimental  data  [16].  The  key 
assumption  m  many  of  these  calculation  is  that 
$(t)  behaves  asymptotically  as 

(i4) 

where  0  <  a  <  1,  so  that  the  first  moment  of  the 
interstep  time  is  infinite.  Symmetric  CTRWs  which 
have  the  property  that  single-step  displacement 
probabilities  have  a  finite  expectation  as  well  as 
the  properties  m  eqs.  (13)  and  (14)  can  be  shown 
to  have  the  asymptotic  property 

E(r2)~(t/T)a  (15) 

where  E(r2)  is  the  expected  value  of  the  mean- 
squared  displacement.  When  eq.  (14)  is  valid  the 
asymptotic  form  of  the  probability  density  of  the 
displacement  at  time  t  will  also  differ  from  the 
Gaussian  form  that  holds  m  ordinary  diffusion. 

Some  of  the  properties  and  many  of  the  appli¬ 
cations  of  CTRWs  satisfying  eq.  (14)  have  recently 
been  reviewed  by  Shlesmgcr  [17]  and  some  of  the 
mathematical  properties  of  CTRWs  based  on  the 
assumption  of  eq.  (14)  arc  given  m  ref.  18.  It  must 
be  said  that,  because  arguments  based  on  CTRW 
models  can  only  be  characterized  as  having  a 
hand-waving  character,  the  applicability  of  such 
models  to  transport  in  disordered  media  can  only 
be  ascertained  on  a  case-by-case  basis,  and  it  is  by 
no  means  clear  that  CTRW  approximations  al¬ 
ways  lead  to  useful  results. 

I  have  presented  only  a  small  sample  of  an 
enormous  number  of  different  approaches  taken 
in  the  literature  of  physics  and  chemistry  to  the 
general  problem  of  transport  m  a  disordered 
medium.  The  development  of  techniques  for  the 
solution,  complete  or  partial,  of  these  problems  is 
a  particularly  active  area  of  research  in  physical 
chemistry,  physics,  and  in  probability  theory.  Some 
of  the  basic  phenomena  that  warrant  investigation 
have  been  very  ably  outlined  in  the  article  by 
Professor  ben-Avraham. 
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Abstract 


Kopelman.  R ,  Anacker,  LW .  Clement,  R,  L»,  L  and  Sander,  L ,  1991  Low  dimensional  reaction  kinetics  and  self-organization 
Chemometncs  and  Intelligent  Laboratory  Systems,  10  127-132 

Diffusion-limited  reaction  kinetics  becomes  anomalous  not  only  for  fractals,  with  their  anomalous  diffusion,  but  also  for 
low-dimensional  (one  and  two)  and  disperse  media,  where  the  random  walk  is  compact  Wc  focus  on  annihilation,  recombination  and 
trapping  reactions  under  non-cquihbnum  steady  state  (steady  source)  or  batch  (big  bang)  conditions  The  typical  i tactions  are 
A  +  A  -*  Products.  A  +  B  -*  Products  and  A  +  C  -*  Products  Wc  are  interested  in  the  global  rate  laws,  and  their  relation  to 
particle-particle  distributions  (c  g ,  pair-correlation  and  nearest-neighbor  distribution  functions)  and  in  local  rate  laws  (if  definable) 
Anomalous  reaction  kinetics  (more  than  classical  kinetics)  is  particularly  sensitive  to  initial  conditions,  source  term  structure, 
conservation  laws  (eg,  equal  densities  for  A  and  B),  excluded  volume  effects,  and  medium  size,  dimensionality  and  anisotropy 
Analytical  formalisms,  scaling  arguments,  computer  (and  supercomputer)  simulations  and  experiments  (on  chemical  and  physical 
reactions)  all  play  an  important  role  in  the  newly  emerging  picture 


INTRODUCTION 

This  work  can  be  viewed  as  a  natural  extension 
of  the  activity  dealing  with  relaxation  phenomena 
and  transient  kinetics  problems  in  disordered 
media  (1-4).  Its  domain  of  application  spans  vari¬ 
ous  areas  of  the  physics  and  chemistry  of  con¬ 
densed  matter.  For  example,  reactions  of  the  type 
A  +  A  -*  0  or  A  T  -» T  arc  models  describing 
exuton  kinetics  in  disordered  molecular  cryntals 
or  polymer  blends.  Reactions  of  the  type  A  +  B  -♦ 
0  are  found  in  solid  state  physics  in  the  case  of 
electron-hole  annihilation  or  defect  fusion.  A 
combination  of  experiments  and  Monte-Carlo 
simulations  [5}  has  paved  the  way  for  a  new  theo¬ 


retical  understanding  of  steady-state  rate  laws  and 
the  kinetic  self-orgamzatron  of  atoms,  defects  and 
elementary  excitations  in  low  dimensional  media 
This  theory  is  presented  below. 

Diffusion  limited  trapping  is  of  particular  inter¬ 
est  in  studies  of  energy  migration  and  lumines¬ 
cence  (1,5].  We  present  below  some  new  simula¬ 
tions  and  their  relation  to  theory.  This  includes 
both  rate  laws  and  self-ordering.  Of  particular 
interest  is  the  resulting  anomalously  high  partial 
order  of  reactions  as  a  function  of  trap  concentra¬ 
tion. 

‘Big-bang’  reaction  models  are  simpler  than 
steady-source  models.  The  pioneering  work  has 
been  done  by  Ovchinnikov  and  Zeldovich  (6)  and 


0169-7439/91/S03  $0  ©  1991  -  Dscner  Science  Publishers  B  V 


128 


Ch.-raomelnc!  and  Imelligenl  Laboratory  Systems  ■ 


by  Toussamt  and  Wilczek  (7).  with  applications  to 
fractals  by  Klafter  ct  al.  (8).  Kang  and  Redner  (9] 
and  Klymko  and  Kopelman  (10J.  However,  these 
ignored  both  finite  size  effects  and  finite  correla¬ 
tion  effects  (at  trine  zero).  We  demonstrate  here 
that  these  finite  extent  effects  give  rise  to  new 
scaling  effects,  i  e ,  anomalous  time  exponents  and 
reaction  orders.  In  particular,  for  the  A  4- B  reac 
tion  in  one-dimension  the  time  exponent  rises 
from  1/4  (Zeldovich  value)  to  3/4  or  1  (depend¬ 
ing  on  boundary  conditions). 


“>  kQ. L>  A  A 

<M  A  A  A^BjTb)  A  AA 

la  tC~T~~)a  T  AA  T 

Fig  I  Schematic  represertauon  of  the  ihree  cases  of  self 
organization  on  a  one-dimensional  system  The  circled  do¬ 
mains  represented  here  arc  of  the  order  of  A.  the  self  organiza¬ 
tion  scale.  1(a)  is  a  depletion  in  ihe  A  +  A  —  0  case  1(b)  is  a 
segregation  in  ihe  A  +  B  -  0  case  1(c)  is  a  trap- panicle  deple¬ 
tion  in  ihe  A+T— T  case 


THEORY  STEADY-STATE  DIFFUSION 
BIMOLECULAR  REACTIONS 


CONTROLLED 


In  the  classical  picture,  all  bimolecular  reac¬ 
tions  are  the  same  and  the  distnbutton  of  re¬ 
actants  is  at  random  Also,  the  reaction  rate  is 
proportional  to  the  product  of  the  reactant  densi- 
ties  (overall  order  of  reaction  ,Y=2).  Previous 
works  show  that  the  time  dependence  of  such 
reactive  systems,  relaxing  from  an  initial  random 
situation,  exhibits  anomalous  decay  rales  in  low 
dimensions  due  to  local  fluctuations  in  reactant 
density  (6-9).  Here  we  leport  the  results  of  a 
theoretical  investigation  on  the  steady  state  prop- 
ertics  of  three  different  bimolecular  diffusion 
limited  reactions,  taking  place  on  regular 
Euclidean  spaces  and  on  fractal  structures  1 11-13). 
We  show  that  the  relevant  parameter  describing 
the  steady  state  of  the  reaction  kinetics  is  the 
spectral  dimension  if,.  The  spectral  dimension  is 
an  intrinsic  parameter  characterizing  energy  trans¬ 
fer  properties,  and  in  particular,  diffusion  in  a 
medium.  For  Euclidean  structures,  </,  is  the 
Euclidean  dimension  <f,  and  the  case  of  Euclidean 
spaces  is  viewed  as  an  extension  of  the  fractal  case 
when  we  take  d-d,.  The  reason  for  the  influence 
of  the  spectral  dimension  on  reaction  kinetics  is 
due  to  the  fact  that  d,  controls  the  time  depen¬ 
dence  of  the  number  of  distinct  sites  visited  by  a 
random  walker.  For  spectral  dimension  d,<  2  we 
show  that  a  bimolecular  reaction  induces  a  self¬ 
organization  of  reactants  up  to  a  scale  A  such 
that: 

d,  <  2  (1) 


where  r  is  a  characteristic  time  which  is  situation 
dependent.  For  if,  >  2,  A  is  microscopic  and  inde¬ 
pendent  of  r,  therefore  no  large  scale  structure 
exists  and  the  reaction  kinetics  is  classical.  The 
case  if,  =  2  is  found  to  be  the  critical  dimension  of 
Ihe  problem,  where  we  find  a  marginal  logarithmic 
dependence  of  A  with  r.  Below  the  critical  dimen¬ 
sion,  large  scale  density  fluctuations  become  rele¬ 
vant  and  each  situation  has  its  own  phenomenol¬ 
ogy  (see  Fig.  1)  In  particular,  we  may  find  macro¬ 
scopic  reaction  laws  with  anomalous  reaction 
orders  (larger  than  2)  or  anomalous  rate  constants 
In  all  the  cases  investigated  we  found  that  the 
scaling  behavior  of  the  self  organization  length 
can  be  case  m  an  interesting  general  way  For 
every  dimension  we  can  write: 

A  /a  ~  S,/V, 

where  a  is  the  microscopic  scale,  S,  is  the  volume 
effectively  explored  by  a  particle  during  the  time  l 
(nun.  er  of  distinct  sites  visited)  and  Vr  is  the 
total  (cumulative)  volume  swept  out  (proportional 
to  t). 

In  bimolecular  diffusion  limited  processes  the 
overall  balance  between  reaction  rates  and  steady 
stale  densities  is  accounted  for  by  the  Smoluchow- 
ski  boundary  condition: 

Q  “  (piPj)/A 

where  p,  and  p,  are,  respectively,  the  steady  state 
densities  of  reactants  I  and  2  (1  and  2  can  be 
identical  species).  The  scaling  dependence  of  the 
self-organization  scale  A  on  r  is  at  the  origin  of 
the  non-classical  behavior. 
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In  the  case  of  homomolecular  annihilation,  A 
+  A  -*  0,  A  is  a  typical  scale  of  depletion  around 
each  reactant  and  t  is  the  typical  reactant  life-time 
with: 

t  «=»  p/Q 

where  p  is  the  steady  state  density  of  A.  We 
obtain  an  anomalous  effective  reaction  order: 

X-l+2/d,  d$<  2  (2) 

In  the  case  of  heteromoleculai  annihilation, 
A  -f  B  -*  0,  A  is  the  scale  of  a  self-organization 
phenomenon  called  segregation.  At  steady  state, 
domains  of  identical  species  with  sizes  comparable 
to  A  build  up  in  the  medium  The  situation  is 
more  complex  than  in  the  homomolecular  reaction 
case  and  r  is  found  to  be  dependent  either  on 
source  conditions  or  on  some  intrinsic  particle  life 
time  We  separated  the  source  terms  into  two 
main  categmes.  In  the  first  category  we  consider 
sources  fo  which  at  any  time  an  identical  number 
of  As  anc  Bs  is  conserved  in  the  medium.  If 
reactants  at.  created  at  random,  we  find: 

t  *  l? 

where  L  is  the  system  size.  We  observe  a  size 
dependent  segregation.  With  the  same  conserva¬ 
tion  constraint,  if  the  particles  are  created  as  A-B 
pairs  with  A  and  B  separated  by  a  distance  6,  we 
have. 

T  » 

The  segregation  scale  becomes  dependent  on  8.  It 
is  important  to  notice  that  for  geminate  creation, 
we  obtain  a  microscopic  segregation  scale  and  this 
situation  becomes  analogous  to  classical  kinetics 
In  the  second  category,  we  consider  sources  where 
the  conservation  constraint  is  removed.  If  no  other 
decay  mechanism  is  present,  fluctuations  in  par¬ 
ticle  difference  grow  until  we  have  a  complete 
saturation  of  the  loop  with  one  of  the  species. 
There  is  no  reactive  steady  state.  If  an  extra  (first 
order)  decay  mechanism  is  considered,  fluctua¬ 
tions  grow  up  to  a  size  defined  by  the  intrinsic 
lifetime  of  the  decay  mechanism.  In  particular  if 
we  consider  vertical  annihilation  with  an  external 
rate  of  particles  R  we  have: 

r  *  R~l 


In  this  case  we  obtain  at  low  density  an  effective 
reaction  order: 

/dt 

On  the  other  hand,  if  the  decay  is  controlled  by  an 
intrinsic  mechanism  A-*0  and  B-*0,  with  the 
same  rate  constant  AT,  then  we  have 

r«/rl 

We  induce  a  K  dependent  segregation  but  no 
anomalous  reaction  order  These  last  three  cases 
are  important  for  practical  applications  because, 
besides  geminate  particle  creation,  it  is  difficult  to 
find  a  source  satisfying  the  exact  conservation 
constraint  However,  though  the  conservation  is 
not  exact,  these  cases  lead  to  a  mesoscopic  segre¬ 
gation  (or  a  total  saturation). 

For  the  trapping  problem,  A  +  T  ->  T,  the 
fluctuation  of  the  trap  distribution  is  found  to  be 
unimportant  for  the  leading  scaling  behavior  of 
the  self  organization  length  A  The  relevant  fact  is 
that  we  have,  for  di  <  2,  an  organization  of  par¬ 
ticles  A  around  the  traps  The  typical  lifetime  at 
steady  state  is 

t~p/Q 

with  p  the  density  of  A  and  Q  the  reaction  rate 
The  scale  of  the  trap-particle  organization  is 

A  »  c1  ~l/d' 

where  c  is  the  trap  concentration  We  have  the 
anomalous  rate  law. 

(3) 

with  an  anomalous  order  relatively  to  the  trap 
concentration. 

X  «  2/d, 

and  we  note  that  the  overall  reaction  order  is 
1  *F  2 /d2,  the  same  as  for  the  A  +  A  -*  0  case. 

SIMULATIONS  OF  STEADY-STATE  TRAPPING 

We  tested  the  trapping  eq.  (3).  The  Monte-Carlo 
simulations  at  the  John  von  Neumann  National 
Supercomputer  Center  give,  for  the  Sierpmski 
gasket,  a  partial  order  )'»  1.02  ±0.02,  with  re¬ 
spect  to  the  particle  density  p,  and  a  partial  order 
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Fig  2  Distribution  of  reactants  for  two  different  trap  con¬ 
centrations  on  a  percolation  cluster  at  criticality  The  traps  arc 
the  black  circles  On  Fig  2a  the  trap  concentration  is  0  05  On 
Fig,  2b  the  trap  concentration  is  0005 

X**  1.47  ±  002,  with  respect  to  the  trap  density  c. 
This  is  in  excellent  agreement  with  the  predictions 
of  eq.  (2):  1  and  tf-2/</#  -  1.465  (</*  = 

1.365).  Similarly,  the  simulations  for  the  critical 
percolation  cluster  are  in  excellent  agreement  with 
the  eq.  (2)  predictions  Y  »  1  and  X  =  2/dt  =1.5 
(d%  =  4/3).  In  addition,  the  depletion  zones  around 
the  traps  can  be  seen  qualitatively  in  Fig.  2. 


SIMULATIONS  OF  A  TRANSIENT  A  +  A-0  AND  A  +  B 
-0 

We  have  employed  three  types  of  landing  rela¬ 
tionships:  correlated,  random  and  evenly  spaced 
landings.  When  a  particle  is  added  to  a  site  oc¬ 
cupied  by  another  particle,  the  landing  particle 
may  immediately  try  to  land  on  another  empty 
site,  which  is  called  ‘forced  landing’.  Particles 
randomly  move  on  a  lattice 

Correlated  landing  occurs  when  a  pair  of  par¬ 
ticles  lands  simultaneously,  separated  by  a  certain 
number  of  lattice  spaemgs  (tj)  One  particle  of  the 
pair  randomly  finds  an  empty  site  on  which  to 
land;  then  the  other  particle  chooses  a  site  in  a 
random  direction  at  the  correlation  length  dis¬ 
tance  from  the  first  particle.  If  this  selected  site 
for  the  second  particle  is  occupied,  both  particles 
of  this  pair  will  repeat  the  process  described  above 
until  they  find  two  empty  sites  at  the  correlated 
distance. 

Random  landing  occurs  when  two  particles  of  a 
pair  are  independent  of  each  other,  and  all  sites  in 
a  lattice  have  equal  probability  for  a  particle  to 
land.  Effectively  there  are  no  ‘pairs’. 

Evenly  spaced  landing  is  used  only  in  simula¬ 
tions  of  transient  reactions  Particles  are  distrib¬ 
uted  throughout  the  lattice,  and  have  an  equal 
distance  between  each  other.  This  interval  is  equal 
to  L/N0  and  is  chosen  to  be  integer,  where  L  is 
the  lattice  length  and  N0  is  the  number  of  the 
particles  at  / »  0. 

Since  the  kinetic  equation  can  be  written  for 
long  times, 

p~r«  (4) 

the  kinetic  data  is  plotted  as  In  p  vs  In  /.  The 
least  linear  square  fit  is  applied  to  find  the  slope 
of  each  part  of  each  line,  which  is  equal  to  --am 
eq.  (4). 

Correlated  landing  for  A  +  B-*0 

A.  For  =  Two  kinds  of  landing  are  in¬ 
vestigated.  One  is  a  pair  of  particles  of  AB  with  a 
definite  orientation  (eg,  ABABAB...).  The 
other  one  is  a  pair  of  particles  of  AB  with  random 
orientations  (e  g .  AB  AB  BA . . . )  These  two  cases 
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Fig  3,  In  p/p0  vs.  In  t  for  A  +  B  —  0  transient  reaction  on 
one-dimensional  lattice  (30000  sites)  with  ft,  *»  005.  From  top 
to  bottom,  the  correlated  landing  lengths  are  1000, 100. 64, 16, 
and  1.  The  dashed  lines  are  fitting  lines.  (1)  with  the  slope  0.5. 
(2)  with  the  slope  06.  (3)  with  the  slope  0  7,  and  (4)  with  the 
slope  0  25 


have  shown  the  same  result  —  a  straight  line  with 
a  slope  0  5.  It  is  important  to  notice  that  this 
result  is  the  same  as  that  m  the  A  +  A  0  case 
(see  below). 

B  For  rj>  I.  The  slopes  of  the  lines  increase 
(from  a  value  0  25)  after  t  >  rf  (see  Fig  3),  which 
is  considered  to  be  the  effect  of  correlation  in 
landing  processes.  As  ij  increases,  the  slopes,  at 
long  times,  increase  toward  the  value  0.75. 

For  i)  >  1,  there  is  no  finite  size  effect,  i.e ,  no 
second  transition  of  the  slope  was  found  (see  the 
bottom  curve  in  Fig.  4) 


Fig  4  In  p/ft,  v$  In  /  for  A  +  B  —  0  transient  reaction  on 
one-dimensional  lattice  '000  sites)  with  ft,  -0  20  From  top 
to  bottom,  they  are  random  landing  with  reflecting  boundary 
condition,  random  landing  with  periodic  boundary  condition, 
and  correlated  landing  with  the  correlated  landing  lengths  50 
The  dashed  lines  are  fitting  lines  (1)  with  the  slope  1  0.  (2) 
with  the  slope  0  75,  and  (3)  with  the  slope  0  25 


Random  landing  for  A  +  B-*0 

Two  types  of  boundary  conditions  are  applied: 
periodic  and  reflecting  boundary  conditions  In 
both  cases,  the  slopes  increase  (from  the  value 
0.25)  at  long  times  (see  Fig.  4),  which  is  consid¬ 
ered  to  be  a  finite  size  effect  However,  important 
differences  between  these  cases  were  observed. 
The  a-value  is  higher  with  periodic  boundary  con¬ 
ditions  (—  1.0)  than  with  reflecting  boundary  con¬ 
ditions  (-075). 

A+A->0 

Both  random  landing  and  correlated  landing 
processes  are  simulated.  Under  the  periodic 
boundary  conditions,  neither  the  effect  of  corre¬ 
lated  landing  nor  the  finite  size  effect  can  be 
found  in  the  A  +  A  -♦  0  case  (see  top  two  lines  in 
Fig  3),  straight  lines  are  found  with  the  slope 
0.50.  However,  under  reflecting  boundary  condi¬ 
tions,  at  long  time,  a  slight  deviation  from  the 
slope  0.5  is  observed 

Our  results  essentially  agree  with  preliminary 
continuum  models  [14),  replacing  the  Zeldovich- 
Kang-Rcdner  time  exponent  ~d%/ 4  (for  dh  ^  4) 
with  —(</,  +  2)/4  (for  2),  for  tightly  corre¬ 
lated  systems  or  tmite-sized  lattices  However,  they 
emphasize  the  relative  importance  of  the  average 
inierparttcle  distance  and  the  finite  scale  of  the 
lattice  or  of  the  correlation  in  the  source.  In 
particular,  for  geminate  landing,  we  do  not  ob¬ 
serve  a  change  in  slope  at  late  times. 
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Abstract 


Weitz,  D  A ,  Lin.  M  Y  and  Lindsay,  II  M .  1991  Universality  laws  in  coagulation  Chemometrics  and  Intelligent  Laboratory  Systems, 
10  133-140 

We  show  that  the  process  of  irreversible,  kinetic  colloid  aggregation  exhibits  universal  behavior,  independent  of  the  detailed 
chemical  nature  of  the  colloidal  particles  Modem  methods  of  statistical  physics,  applied  to  a  kinetic  growth  process  provide  a  good 
basis  to  model  the  observed  behavior  Two  limiting  regimes  of  colloid  aggregation  are  identified  rapid  aggregation,  limited  sote'y  by 
the  diffusion  of  the  growing  clusters,  and  slow  aggregation,  limited  by  the  reaction  rate  that  leads  to  the  formation  of  bonds  between 
the  clusters  In  each  regime  the  cluster  structure  is  fractal,  with  fractal  dimension  dt  -  1  8  for  diffusion-limited  clusters  and  dt  -  2  1 
for  reaction-limited  clusters  A  scaling  method  is  used  to  compare  dynamic  light  scattering  d3la  obtained  from  completely  different 
colloids  aggregated  under  the  two  limiting  conditions.  These  data  provide  a  critical  comparison  of  the  behavior  of  the  different 
colloids,  and  confirm  the  universality  of  each  limiting  regime  of  colloid  aggregation 


INTRODUCTION  the  great  complexity  of  the  problem  has  limited 

the  extent  of  our  understanding  of  the  process 
The  aggregation  of  colloidal  particles  to  form  The  structure  of  the  clusters  is  highly  random  and 

larger  clusters  is  a  process  of  wide  technological  disordered,  making  a  quantitative  analysis  of  their 

importance  and  of  great  scientific  interest.  It  has  shape  quite  difficult.  Furthermore,  a  wide  variety 

been  the  subject  of  serious  scientific  study  for  well  0f  different  types  of  behavior  can  seen  for  even  a 

over  one  hundred  years.  However,  until  recently  single  colloid.  This  has  precluded  the  development 

of  a  simple  theoretical  understanding  of  this  com¬ 
plex,  yet  important  process. 

*  "present  address  National  Institute  of  Standards  and  Tech-  More  recently,  however,  Significant  progress  has 

nology.  React  A106.  Gaithersburg.  MD  20899.  USA.  been  achieved  in  our  understanding  of  irreversible 
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colloid  aggregation  11-3).  The  impetus  for  much 
of  this  progress  has  been  the  recent  developments 
in  statistical  physics  Scaling  concepts,  winch  have 
found  so  much  success  in  describing  such  reversi¬ 
ble  processes  as  phase  transitions,  have  now  also 
been  applied  with  similar  success  to  irreversible 
kinetic  growth  processes,  such  as  colloid  aggrega¬ 
tion  Indeed,  recent  work  has  shown  that  irrevers¬ 
ible  colloid  aggregation  exhibits  universal  behav¬ 
ior,  which  transcends  the  chemicals  details  of  the 
particular  colloid  system,  and  which  provides  a 
unified,  and  relatively  simple,  description  of  this 
complex  process  (4,5)  In  this  paper,  we  present  a 
brief  review  of  the  recent  applications  of  these 
concepts  of  modern  statistical  physics  to  colloid 
aggregation,  and  discuss  the  universal  features  that 
have  emerged. 

There  are  two  general  classes  of  colloid  aggre¬ 
gation  which  have  been  widely  studied  (1).  Both 
begin  with  a  monodispcrse  suspension  of  small, 
solid  particles  undergoing  Brownian  motion  When 
the  aggregation  is  initiated,  the  diffusive  motion  of 
the  particles  leads  to  collisions  between  them, 
causing  them  to  stick  together  and  form  larger 
clusters  In  the  first  class  of  aggregation,  the  clus¬ 
ters,  once  formed,  no  longer  diffuse,  and  all  aggre¬ 
gation  is  due  to  the  accretion  of  single  particles 
This  class  is  called  single  particle  aggregation  By 
contrast,  in  the  second  class,  the  clusters  them¬ 
selves  continue  to  diffuse,  collide  and  form  yet 
larger  clusters  As  the  clusters  grow,  what  began  as 
a  monodisperse  distribution  of  single  particles 
evolves  into  a  very  complex  distribution  of  clus¬ 
ters  of  different  sizes.  This  class  is  called  cluster- 
cluster  aggregation  Both  types  of  aggregation  have 
been  extensively  studied  theoretically.  However, 
most  experimental  studies  of  colloid  aggregation 
have  focused  on  the  cluster-cluster  class,  as  it  is  by 
far  the  most  commonly  encountered. 

Several  key  features  characterize  any  aggrega¬ 
tion  process  (3).  These  include  the  structure  of  the 
clusters,  the  kinetics  of  the  aggregation  and  the 
shape  of  the  cluster  mass  distribution  and  its 
evolution  in  time.  It  is  in  the  description  of  each 
of  these  features  that  the  application  of  modern 
methods  of  statistical  physics  and  the  concepts  of 
scaling  has  provided  such  progress.  The  first  ap¬ 
plication  of  these  techniques  was  to  the  descrip¬ 


tion  of  the  structure  of  the  clusters.  The  cluster 
structure  is  highly  random  and  disordered,  and 
had  long  defied  any  quantitative  description 
However,  the  cluster  structure  can,  in  fact,  be 
quantitatively  parameterized  by  means  of  a  type 
of  symmetry,  that  of  invariance  under  a  change  in 
length  scale,  or  dilation  symmetry.  Thus  colloidal 
aggregates  can  be  characterized  as  fractals  (6],  and 
their  structure  can  be  quantitatively  parameterized 
by  means  of  their  fractal  dimension  (7)  The  aggre¬ 
gation  kinetics,  and  the  shape  and  time  evolution 
of  the  cluster  mass  distribution  can  both  be 
addressed  through  the  application  of  scaling,  m 
this  case,  m  time  The  shape  of  the  cluster  mass 
distribution  is  found  to  be  invariant  in  time,  with 
all  the  time  dependence  described  by  the  evolution 
of  the  average  cluster  mass  (8,9). 

The  fundamental  property  which  determines 
the  nature  of  cluster-cluster  aggregation  is  the 
form  of  the  interaction  potential  between  two 
colloidal  particles  as  they  approach  one  another 
(10]  Colloidal  particles  which  are  stable  against 
aggregation  have  some  form  of  repulsive  interac¬ 
tion  which  prevents  two  approaching  particles 
from  touching  and  sticking  together  This  repul¬ 
sion  is  often  due  to  charged  groups  adsorbed  on 
the  surface  of  the  colloidal  particles,  but  can  also 
arise  from  other  sources,  such  as  a  thin  coating  of 
polymer  on  the  particle  surface  The  height  of  the 
resultant  repulsive  barrier,  Eb .  must  be  much 
greater  than  kBT  for  the  colloid  to  be  stable 
against  aggregation  If  Eh  is  reduced,  colliding 
particles  can  surmount  the  barrier,  and  stick  to¬ 
gether,  thus  initiating  the  aggregation  process.  The 
rate  of  aggregation  will  be  determined  by  the 
probability,  Py  that  two  particles  will  stick  upon 
colliding.  This  is  determined  by  the  height  or  i 
remaining  barrier,  and  is  given  by  P  ~  exp(-.sb/ 

k,n 

The  exponential  dependence  of  the  sticking 
probability  on  Eh  makes  the  aggregation  rate  very 
sensitive  to  the  value  of  the  repulsive  energy  bar¬ 
rier,  and  a  very  wide  range  of  aggregation  rates 
can  be  obtained  with  any  colloidal  suspension. 
However,  there  are  two  characteristic,  limiting  re¬ 
gimes  of  aggregation  [11].  In  the  first,  the  repul¬ 
sive  barrier  is  removed  completely,  so  that  Eh  « 
kBT  and  P  »  1.  In  this  case,  every  collision  results 
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in  the  particles  or  clusters  sticking  to  one  another, 
and  the  aggregation  rate  is  limited  solely  by  the 
time  between  diffusion-induced  collisions  This 
class  of  aggregation  is  called  diffusion-limited  col¬ 
loidal  aggregation  (DLCA).  In  the  second  regime, 
the  repulsive  barrier  is  reduced  only  a  small 
amount,  so  that  Eb  >  kBTy  and  P  is  very  small.  In 
this  case,  a  large  number  of  collisions  are  required 
before  two  particles  or  clusters  stick  to  one 
another,  which  limits  the  aggregation  rate.  This 
regime  is  called  reaction-limited  colloid  aggrega¬ 
tion  (RLCA)  The  two  regimes  lead  to  very  rapid 
and  very  slow  aggregation  respectively,  and  have 
been  recognized  as  such  in  the  traditional  colloid 
literature  flO).  However,  they  also  form  two  limit¬ 
ing  types  of  behavior,  with  distinct,  and  universal 
features  characteristic  of  each. 

The  ‘rules'  which  determine  the  aggregation  in 
each  regime  are  quite  simple.  In  DLCA,  two  clus¬ 
ters  stick  immediately  upon  contact,  and  the  diffu¬ 
sive  nature  of  the  motion  of  the  clusters  plays  an 
important  role  in  determining  both  their  structure 
and  the  aggregation  kinetics.  The  diffusive  motion 
ensures  that  the  clusters  always  stick  to  one 
another  at  the  edges,  making  the  resultant  aggre¬ 
gates  significantly  more  tenuous.  By  contrast,  m 
RLCA,  the  sticking  probability  is  so  low  that,  on 
an  average,  statistical  basis,  two  clusters  can  adopt 
any  bonding  configuration  that  is  physically  possi¬ 
ble,  since  the  clusters  have  sufficient  opportunity 
to  explore  ail  possible  configurations  Thus  the 
diffusive  nature  of  the  cluster  motion  does  not 
play  a  significant  role  in  the  aggregation  process, 
and  the  clusters  no  longer  stick  solely  at  the  edges, 
making  their  structure  significantly  less  tenuous 
In  both  regimes,  the  bonds  between  particles,  once 
formed,  are  assumed  to  be  both  permanent  and 
rigid,  so  that  no  further  change  in  their  structure 
occurs  as  the  aggregation  proceeds. 

The  nature  of  the  interparticle  interactions  de¬ 
termines  the  kinetics  of  the  aggregation  process, 
the  kinetics  in  turn  play  a  significant  role  in 
determining  the  structure  of  the  clusters  formed, 
and  the  shape  of  the  mass  distribution  of  clusters. 
Furthermore,  since  a  very  large  number  of  clusters 
are  involved  in  any  aggregation  process,  and  since 
the  details  of  the  structure  of  each  clusters  are  not 
as  important  as  the  overall  features,  a  statistical 


description  is  well  suited  to  describing  the  physics 
The  basic  simplicity  of  the  underlying  physics 
facilitates  modeling  the  aggregation  process  The 
models  developed  deal  solely  with  the  nature  of 
the  interaction  and  the  resultant  “rules”  which 
determine  how  clusters  move  and  stick  to  one 
another.  Thus,  these  models  are  independent  of 
the  detailed  chemical  nature  of  each  colloid,  and 
should  apply  equally  well  to  all  colloids.  It  is  in 
this  sense  that  the  description  of  colloid  aggrega¬ 
tion  should  be  universal. 


THEORY 

The  two  limiting  regimes  of  cluster-cluster  ag¬ 
gregation  have  been  studied  ex'ensively,  and  an 
elegant  and  detailed  picture  of  iheir  behavior  has 
now  been  developed  [1,3]  The  theoretical  work 
has  entailed  two  basic  approaches  the  simplicity 
of  the  rules  of  the  aggregation  make  computer 
simulation  a  very  powerful  method  for  studying 
both  regimes,  and  considerable  knowledge  has 
been  obtained  about  the  structure  of  the  clusters 
and  the  shape  and  time  evolution  of  the  cluster 
mass  distribution  [12]  The  aggregation  kinetics 
and  the  cluster  mass  distribution  have  also  been 
studied  extensively  through  the  use  of  the 
Smoluchowski  equations  [13].  These  are  a  set  of 
rate  equations  which  assume  that  the  aggregation 
rate  between  two  clusters  depends  solely  on  their 
masses  Scaling  techniques  have  proven  to  be  well 
suited  to  the  study  of  these  equations  [8,9],  Experi¬ 
mentally,  a  wide  range  of  colloid  systems  have 
been  studied  using  many  different  techniques  Ex¬ 
cellent  agreement  is  obtained  between  the  experi¬ 
mental  observations  and  the  theoretical  predic¬ 
tions  [14,15]. 

Each  regime  is  distinguished  by  several  distinct 
characteristics,  the  clusters  formed  m  each  regime 
are  fractal,  so  that  their  mass  scales  with  their 
radius  as  A/  *  (R/a)d\  where  a  is  the  radius  of  a 
single  particle  and  df  is  the  fractal  dimension, 
which  is  non-integral  and  less  than  the  dimension 
of  space.  For  DLCA.  df  -  1.8  while  for  RLCA, 
dt  -  2.1.  The  cluster  mass  distribution  in  each 
regime  exhibits  dynamic  scaling  and  can  be  writ¬ 
ten  as  N(M)  =  A/2^(A//A/>,  where  the  scaling 
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function,  $(Sf/Sf)  describes  the  shape  of  the 
cluster  mass_distribution  and  is  independent  of 
time,  while  M  is  the  mass  of  the  average  cluster 
and  reflects  all  of  the  lime  dependence  of  the 
aggregation.  For  DLCA.  A'(iV)  is  slightly  peaked 
around  the  average  mass  with  an  exponential 
cutoff  at  larger  masses.  For  RLCA.  the  duster 
mass  distribution  has  a  power-law  form  with  an 
exponential  cutoff  at  large  mass.  A’(Af)  - 
iV'"2iW'*exp(— M/M).  The  kinetics  of  the  ag¬ 
gregation  arc  determined  by  the  lime  dependence 
of  St:  for  DLCA.  St  grows  linearly  with  time, 
while  for  RLCA  it  grows  exponentially  with  lime. 


EXPERIMENTAL 

To  experimentally  demonstrate  the  universal 
features  of  colloid  aggregation,  we  compare  the 
behavior  of  three  completely  different  colloids: 
gold,  silica  and  polystyrene  latex  [4],  Each  colloid 
is  comprised  of  a  different  ma.enal.  each  colloid 
is  initially  stabilized  by  completely  different  func¬ 
tional  groups  on  their  surfaces;  the  aggregation 
for  each  colloid  is  initiated  in  a  different  manner, 
the  mterparticle  bonds  in  the  aggregates  for  each 
colloid  arc  different,  and  each  colloid  has  a  differ¬ 
ent  primary  particle  size.  However,  each  colloid 
can  be  made  to  aggregate  by  cither  diffusion- 
limited  or  reaction-limited  kinetics. 

The  colloidal  gold  has  a  particle  radius  of  a  = 
7.5  nm  and  an  initial  volume  fraction  of  =  10 “6. 
It  is  stabilized  by  citrate  ions  adsorbed  on  the 
surface  The  aggregation  is  initiated  by  addition  of 
pyridine,  which  displaces  the  charged  ions,  reduc¬ 
ing  the  repulsive  barrier  between  the  particles.  The 
amount  of  pyndinc  added  determines  the  aggrega¬ 
tion  rate:  for  DLCA,  the  pyridine  concentration  is 
10“ 2  M,  while  for  RLCA,  it  is  about  10“ 5  At.  The 
mterparticle  bonds  are  metallic. 

The  colloidal  silica  used  is  Ludox  SM  obtained 
from  DuPont  It  has  particles  with  a  =  3  5  nm, 
and  is  diluted  to  =  10”6.  It  is  initially  stabilized 
by  OH“  or  SiO“  on  the  surface.  The  pH  is  kept 
<  11  by  addition  of  NaOH  and  the  aggregation  is 
initiated  by  addition  NaCl,  which  reduces  the 
Debye- Huckel  screening  length,  thereby  reducing 
the  repulsive  barrier  between  the  particles.  For 


DLCA.  the  salt  concentration  is  0.9  St.  while  for 
RLCA.  it  is  0.6  At.  The  inierpanide  bonds  are 
believed  \o  be  silica  bonds. 

The  polystyrene  latex  has  a  =  19  nm  and  is 
diluted  to  <5v, m  10“  It  is  initially  stabilized  by 
charged  carboxylic  add  groups  on  the  surface  of 
the  particles.  Addition  of  HO  to  a  concentration 
of  1 2  St  is  used  to  neutralize  the  surface  charges 
and  decrease  the  screening  length  to  initiate  the 
aggregation  for  DLCA.  For  RLCA.  NaCl  is  added 
to  a  concentration  of  0.2  St.  to  reduce  the  screen¬ 
ing  length  and  initiate  the  aggregation.  The  par¬ 
ticle  surfaces  deform  on  hording  leading  to  large 
Van  der  Waais  interactions  between  the  bound 
particles. 

To  study  the  aggregation  of  each  colloid  and  to 
critically  compare  their  behavior  in  the  two  re¬ 
gimes,  we  use  light  scattering  |16J.  Static  light 
scattering  is  used  to  measure  the  fractal  dimension 
of  the  clusters,  while  dynamic  light  scattering  is 
used  to  follow  the  aggregation  kinetics.  In  ad¬ 
dition.  the  dynamic  light  scattering  data  obtained 
from  each  colloid  in  each  regime  can  be  scaled 
onto  a  single  master  curve.  The  shape  of  this 
master  curve  is  very  sensitive  to  the  features  of  the 
aggregation  process,  depending  on  the  detailed 
structure  of  the  clusters  and  the  shape  of  the 
clusters  mass  distribution.  However,  all  features 
particular  to  the  individual  colloids  are  sealed  out 
of  the  master  curve,  allowing  the  curves  from  the 
different  colloids  to  be  compared  directly,  with  no 
free  parameters,  providing  a  critical  test  of  the 
universality  of  colloid  aggregation  in  each  of  the 
two  limiting  regimes  (4). 


RESULTS 

Static  light  scattering  measures  the  time  aver¬ 
aged  scattering  intensity  from  the  sample.  I(q),  as 
function  of  the  scattering  wavevector.  q  —  (Arm/ 
A)sm(0/2),  where  A  is  the  incident  wavelength  in 
vacuuo,  n  is  the  index  of  refraction  of  water,  and 
0  is  the  scattering  angle.  Dynamic  scattering  mea¬ 
sures  the  temporal  autocorrelation  function  of 
fluctuations  in  the  scattering  intensity  resulting 
from  the  diffusive  motion  of  the  clusters.  Wc 
measure  both  the  total  scattered  intensity  and  the 
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autocorrelation  function  concurrently  as  functions 
of  the  scattering  angle.  and  itence  the  scattering 
wnv  elector.  The  excitation  source  is  the  4SS-nm 
line  of  an  Ar*  laser,  and  the  accessible  scattering 
lectors  are 0.003  0.03  nm " *. 

Static  light  scat  ten  ng  probes  the  internal  struc¬ 
ture  of  the  aggregates.  Because  the  fractal  clusters 
are  self-similar  in  structure,  the  scant*  ad  intensity 
from  each  cluster  depends  on!>  on  the  product 
qRr  where  Rg  is  the  radius  of  gyration  of  the 
cluster.  At  low  qRt,  the  internal  structure  of  the 
aggregate  is  not  resolved,  and  the  scattered  inten- 
sit>  is  isotropic,  independent  of  a.  At  high  qR$ . 
however,  the  internal  fractal  structure  is  resolved 
and  the  scattered  intensity  scales  as  (qRt)~Jl.  The 
measured  intensity  is  a  weighted  average  over  the 
cluster  mass  distribution.  However,  for  aggregates 
that  are  sufficiently  large,  the  total  measured  in¬ 
tensity  also  exhibits  the  fractal  scaling  in  q%  allow¬ 
ing  dt  to  be  determined  directly.  The  static  light 
scattering  obtained  from  all  the  colloids  in  each 
regime  is  shown  in  Fig.  1.  In  each  case,  the  data 
were  collected  only_after  the  clusjers  were  suffi¬ 
ciently  large  that  qR^  1.  where  R  is  an  average 
cluster  size.  The  linear  behavior  in  the  double 
logarithmic  plots  confirms  the  fractal  structure  of 
the  aggregates.  The  upper  three  data  sets  are  ob¬ 
tained  from  clusters  prepared  under  DLCA  condi¬ 


tions:  and  have  tft  =  1.S6  for  the  gold.  dt  *  1.S5 
for  the  silica  and  J,  *  1.S2  for  the  polystyrene.  To 
within  the  experimental  error  of  rocghlv  ±  0.05. 
these  results  are  identical.  By  contrast,  the  lower 
three  data  sets,  which  arc  obtained  from  clusters 
prepared  under  RLCA  conditions,  have  con¬ 
sistently  higher  values  of  the  fractal  dimensions, 
with  d  =  2.14  for  the  gold.  df  *  2.07  for  the  silica 
and  df  =  2.09  for  the  polystyrene  These  values  are 
again  equal  to  within  experimental  error.  Thus 
these  results  demonstrate  the  universal  behavior  of 
the  structure  of  the  fractal  colloid  aggregates  in 
each  of  the  two  regimes. 

Dynamic  light  scattering  probes  the  diffusive 
motion  of  the  dusters.  When  the  clusters  arc  large 
enough  that  their  internal  fractal  structure  can  be 
resolved,  both  their  translational  and  rotational 
diffusion  contribute  to  the  fluctuations  (17].  Here, 
we  consider  only  the  first  cumulant  (1SJ.  or  the 
initial  logarithmic  derivative  of  (he  autocorrelation 
funciion  of  the  intensity  fluctuations.  This  is  given 
by  T,  =  q:Dc1f(qRt).  where  the  effective  diffusion 
coefficient  reflects  the  contribution  of  both  trans¬ 
lational  and  rotational  diffusion  When  qRt<z  1. 
only  translational  diffusion  contributes  and 
where  f  =  kgT/6-xrj  and 
t]  is  the  fluid  viscosity.  The  hydrodynamic  radius 
is  related  to  the  radius  of  gyration  of  the  cluster. 

with  p~l.  For  qRK^  1.  rotational 
diffusion  also  contributes  and  3*1-2  D. 

The  effective  diffusion  coefficient  determined 
from  the  measured  first  cumulant  is  again  a 
weighted  average  over  all  the  clusters  in  the  distri¬ 
bution.  It  is  given  by 

-  £iV(.u)/(9«t)ocn 

<n  =  £-v(,w)/(?«5)  (I) 

In  the  limit  of  qR  -*  0.  D(U  =  D,  providing  a  good 
measure  of  the  average  cluster  size.  R  =  l/D. 

The  combination  of  the  sensitivity  to  the  cluster 
mass  distribution  and  rotational  diffusion  leads  to 
a  pronounced  q  dependence  in  the  measured  D<ftt 
and  provides  a  very  sensitive  probe  of  the  aggrega¬ 
tion  process  (4,16]  However,  to  fully  explore  this 
q  dependence  at  a  single  point  in  time  during  the 
aggregation  process  would  require  an  experimen¬ 
tally  inaccessible  range  of  scattering  angles.  In- 
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F15.  2.  Mwcr  cents  obtained  indcpcodcstly  frocn  dynamic 
fcjh:  jcanaicg  dau  froca  each  of  the  three  colloids  aggregated 
under  ddfusioo-Urajtcd  condition*.  The  carves  are  indis¬ 
tinguishable.  dcinonsi  rating  the  universality  of  DLCA.  TV 
sobd  hne  is  the  calculated  behavior.  O  -  Gold.  +  ~  silica, 
x  -  polystyrene. 

stead.  we  exploit  the  dynamic  scaling  of  the  clus¬ 
ter  mass  distribution  to  measure  Dat  over  a  much 
wider  range  of  qR.  Thus,  we  determine  An  over 
the  range  of  q  experimentally  accessible  and  re¬ 
peat  the  measurements  during  the  aggregation 
process,  as  R  increases,  while  the  shape  of  the 
cluster  mass  distribution  remains  unchanged.  The 
values  measured  at  each  q  are  interpolated  to 
obtain  a  scries  of  data  sets,  each  consisting  of 
£cu(q)  evaluated  at  the  same  lime.  We  normalize 
DeU  by  Z),  and  plot  the  data  as_a  function  of  qR , 
where  the  required  parameter.  D  =  l/R,  for  each 
set  is  determined  empirically  by  scaling  the  data 
onto  a  single  master  curve.  With  sufficient  data, 
there  is  always  a  substantial  overlap  between  data 
from  different  sets,  making  the  scaling  unambigu¬ 
ous.  All  material  parameters  are  scaled  out,  so 
that  these  master  curves  provide  a  means  to  criti¬ 
cally  compare  the  behavior  of  completely  different 
colloids 

The  master  curve  obtained  for  each  colloid 
aggregated  under  DLCA  conditions  are  shown  in 
Fig.  2,  while  the  master  curves  for  each  colloid 
aggregated  under  RLCA  conditions  are  shown  in 
Fig.  3.  The  shape  of  the  master  curve  for  DLCA  is 
quite  different  from  that  of  RLCA.  This  reflects 
the  different  shapes  of  N(M)  for  each  regime, 
with  the  power-law  form  for  RLCA  leading  to  a 
considerably  stronger  ^-dependence  of  the  master 


curve.  In  each  regime,  the  master  curves  for  the 
three  colloids  are  indistinguishable.  We  emphasize 
that  the  master  curves  for  each  colloid  are  ob¬ 
tained  independently,  and  there  is  no  free  parame¬ 
ter  in  comparing  them.  This  is  striking  evi  Jence  of 
tha  universality  of  each  of  the  regimes  of  colloid 
aggregation. 

The  solid  lines  drawn  through  the  master  curves 
are  the  calculated  values  using  eq.  (1).  with  the 
forms  for  A’(  AI )  expected  for  each  regime  and  a 
form  for  I(qRt)  obtained  from  computer  simu¬ 
lated  clusters  for  the  appropriate  regime  [19].  The 
agreement  is  very  good,  except  for  DLCA  at  large 
qR.  The  calculation  for  the  RLCA  regime  allows 
us  to  determine  the  cluster  mass  exponent.  r«=  1.5. 
which  is  in  accord  with  theoretical  predictions 
based  on  the  Smoluchowski  equations  [20J. 

The  scaling  values  of  R  also  allow  us  to  de¬ 
termine  the  aggregation  kinetics  of  each  colloid  in 
each  regime.  We  show  the  results  for  the  DLCA 
regime  in  Fig.  4.  where  we  plot  R  as  a  function  of 
aggregation  time  tM  in  a  double  logarithmic  plot 
[14].  The  linear  behavior  exhibited  by  each  colloid 
confirms  the  power-law  kinetics,  the  slopes,  com¬ 
bined  with  the  measured  fractal  dimensions,  give 
the  power  law  for  the  growth  of  the  average  mass 
In  all  cases,  this  exponent  is  1  to  within  experi¬ 
mental  error.  The  different  offsets  of  the  three 
curves  reflect  the  differences  in  the  initial  con- 


Fig.  3.  Master  curves  obtained  independently  from  dynamic 
light  scattering  data  from  each  of  the  three  colloids  aggregated 
under  reaction-limned  conditions  The  curves  are  indis¬ 
tinguishable.  demonstrating  the  universality  of  RLCA.  The 
solid  hne  is  the  calculated  behavior  o  -  Gold,  +  -  silica, 
X  -  polystyrene 
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Fig.  4  The  aggregation  kinetics  of  the  diffusion-limited  aggre¬ 
gation  of  each  of  the  three  colloids  obtained  from  the  scaling 
of  the  data  onto  the  master  curves.  The  slopes  of  the  power-lav, 
kinetics  and  the  fractal  dimensions  show  that  the  average 
cluster  mass  grows  linearly  with  time  in  all  cases  o  =  Gold, 
i-  -  silica.  ♦  -  polystyrene. 


centrations.  The  results  for  the  RLCA  regime  for 
each  of  the  colloids  are  shown  in  Fig.  5,  where  we 
now  use  a  semiloganthmic  plot  to  show  the  ex¬ 
ponential  growth  observed  for  each  colloid  [15].  In 
this  case,  the  different  slopes  reflect  the  different 
initial  aggregation  rates  of  each  colloid,  which  do 
depend  on  the  details  of  the  chemistry  Indeed,  for 
the  polystyrene,  some  time  apparently  elapses  be¬ 
fore  the  final  aggregation  rate  is  achieved.  We 
believe  that  this  is  caused  by  the  deformation  of 
the  particles  which  occurs  on  bonding  and  which 


Fig  5  The  aggregation  kinetics  of  the  reaction-limited  aggre¬ 
gation  of  each  of  the  three  colloids,  demonstrating  the  ex¬ 
ponential  kinetics  in  each  case.  O  -  Gold,  +  -  silica.  *  - 
polystyrene 


modifies  the  sticking  probability  at  early  time 
Nevertheless,  all  colloids  display  exponential 
growth  of  the  radius  of  the  average  cluster,  and 
hence  of  the  mass,  as  expected. 


CONCLUSIONS 

In  summary,  we  have  shown  experimental  evi¬ 
dence  to  demonstrate  the  universal  features  of 
colloid  aggregation.  Two  limiting  regimes  are  ob¬ 
served-  fast,  diffusion-limited  and  slow,  reaction- 
limited  colloid  aggregation  Each  regime  follows 
universal  laws  that  describe  its  behavior  In  many 
experimental  situations,  these  limiting  regimes  are 
not  achieved.  Nevertheless,  the  overall  aggregation 
behavior  can  usually  be  described  in  terms  of 
these  two  regimes  Typically  the  initial  stages  of 
the  aggregation  are  controlled  by  some  inter¬ 
mediate  value  of  the  sticking  probability,  and  the 
aggregation  is  not  strictly  diffusion-limited.  In¬ 
stead,  at  the  earlies  times,  it  can  be  approximated 
as  reaction-limited  However,  as  the  aggregation 
proceeds,  and  the  concentrations  of  clusters  de¬ 
creases,  their  spacing  increases,  and  diffusion  be¬ 
comes  increasingly  important  as  a  rate  limiting 
step  Thus  at  longer  times  the  aggregation  crosses 
over  to  diffusion-limited  Thus,  these  two  limning, 
and  universal,  regimes  provide  the  basis  for  de¬ 
scribing  a  large  range  of  behavior  for  colloid  ag¬ 
gregation 
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Abstract 


Osteryoung.  J,  1991  Inference  of  mechanism  from  kjneue  analysis  of  pulse  voltammctnc  data  Chemorretncs  and  Intelligent 
Laboratory  Systems,  10  141-154 

Voltammetry  provides  direct  access  to  kinetic  information  in  that  the  measured  quantity,  current,  is  itself  the  rate  Kinetic  analysis 
of  voltammctnc  data  generally  focuses  on  the  potential  dependence  of  the  current.  For  historical  reasons,  the  most  common  method 
of  analyzing  data  is  to  transform  the  data,  often  by  very  elaborate  methods,  to  yield  a  potential-dependent  rate  constant.  which  is 
then  plotted  as  a  semiloganthmic  function  of  potential.  This  procedure  requires  extrinsic  normalization  factors  which  easily  can 
introduce  systematic  error  In  a  few  instances,  statistically  sound  methods  have  been  employed  for  analysis  of  data  One  approach 
employs  a  nonlinear  least  squares  procedure  equivalent  to  the  method  of  maximum  likelihood  In  addition  to  providing  opt.mal 
values  of  kinetic  parameters  without  recourse  to  other  data,  this  method  also  provides  confidence  regions  at  a  known  level  of 
confidence  This  method  is  implemented  by  the  COOL  algorithm,  which  has  been  desenbed  An  important  ancillary  factor  is  that  the 
COOL  algorithm  runs  in  •real-time'  for  many  problems  This  paper  describes  these  alternative  methods  of  analysis  by  using  the 
particular  example  of  slow  charge  transfer  The  sensitivity  of  the  analysis  to  chimges  m  values  of  parameters  is  examined  by 
computation  of  confidence  regions  Then  three  specific  kinetic  problems  are  used  to  illustrate  the  types  of  questions  which  anse  in  the 
inference  of  mechanism  The  first  involves  the  search  for  a  second  order  dependence  of  current  on  potential,  this  having  been 
predicted  by  theoretical  treatments  The  second  can  anse  in  cases  where  two  electrons  are  transferred  Under  what  conditions  is  it 
possible  to  determine  the  rate  parameters  for  both  transfers?  What  criteria  ensure  that  the  variance  in  the  data  is  explained  by  only 
one  charge  transfer  step  (i  e ,  the  other  is  too  fast  to  see)'’  The  third  problem  concerns  heterogeneous  charge  transfers  coupled  by  a 
homogeneous  chemical  step  When  the  second  charge  transfer  is  more  favored  than  the  first,  wnen  docs  it  take  place  through  a 
homogeneous  reaction  route,  and  under  what  conditions  tan  this  be  detected’  The  experimental  examples  include  the  reduction  of 
Zn(!I)  and  the  reduction  of  p-mtrosophenol,  both  at  mercury  electrodes.  The  data  are  confounded  to  some  degree  by  experimental 
artifacts,  nonrandom  distribution  of  residuals  may  anse  from  these  artifacts  or  from  choice  of  overly  simple  models 


INTRODUCTION 

Voltammetry  comprises  a  suite  of  electrochem¬ 
ical  techniques  wherein  the  potential  of  an  elec¬ 
trode  is  controlled  and  the  resulting  current  is 
measured.  Time  is  generally  a  parameter  of  the 
experiment  Pulse  voltammetry  comprises  a  subset 


of  voltammetric  techniques  m  which  potential  is 
changed  only  in  a  stepwise  fashion  (changes  in 
potential  are  instantaneous  on  the  time  scale  of 
the  experiment).  The  pulse  mode  has  many  ad¬ 
vantages  both  experimentally  and  computationally 
when  the  experiment  is  earned  out  under  the 
real-time  control  of  a  digital  computer  suitable  for 
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high  speed  calculations  This  paper  deals  only 
with  pulse  voltammetry.  However,  its  main  points 
apply  to  voltammetry  in  general 

Voltammetry  provides  direct  access  to  kinetic 
information  in  that  the  measured  quantity,  cur¬ 
rent,  is  itself  the  rate.  Kinetic  analysis  of  voltam- 
metnc  data  generally  focuses  on  the  potential  de¬ 
pendence  of  the  current.  The  purpose  of  kinetic 
analysis  is  generally  two-fold,  first  to  infer  from 
(he  rate  data  the  mechanism  by  which  chemical 
transformation  takes  place,  and  second  to  obtain 
values  of  the  rate  constants  or  other  parameters 
wluch  characterize  the  system.  Here  this  general 
problem  is  introduced  by  describing  a  straightfor¬ 
ward  example,  the  simple,  first-order  slow  transfer 
of  an  electron. 

The  phenomenon  of  potential  dependence  of 
the  rate  is  well-known  and  was  first  formulated 
empirically  in  the  Tafel  equation  (1] 

1}  —  o  +  b  log  /  (1) 

where  n  is  the  overpotential  (potential  minus  equi¬ 
librium  potential),  i  is  the  steady-state  current, 
and  a  and  b  are  empirical  constants.  The  experi¬ 
ments  which  gave  nse  to  this  observation  em¬ 
ployed  large  concentrations  of  oxidized  and  re¬ 
duced  forms  in  contact  with  an  inert  electiode,  so 
that  the  equilibrium  potential  was  well  fixed,  and 
so  that  small  excursions  of  potential  from  the 
equilibrium  value  would  not  significantly  change 
the  concentrations  near  the  electrode  This  mode 
of  kinetic  measurements  dominated  the  study  of 
electrochemical  kinetics  for  the  next  50  years. 

It  was  not  until  the  development  of  polarogra- 
phy  by  Hcyrovsky  in  the  70s  and  ’30s  [2J  that 
changes  in  concentration  near  the  electrode  and 
resulting  diffusion  were  treated  mathematically. 
After  World  War  II,  the  confluence  of  mathemati¬ 
cal  expertise,  the  computational  power  offered  by 
computers,  and  improved  electronics  permitted 
dynamic  experiments  m  which  potential  could  be 
changed  rapidly  and  automatically.  An  ap¬ 
propriate  formulation  of  the  current  arising  from 
the  reduction  of  reactant  O  to  product  R  under 
these  experimental  conditions  is 

(2) 


where 

k,=  k°  cxp[ —anf(E  —  £°')j 

(3) 

exp((l -«)«/(£-£-')] 

(4) 

and  Q)(0,  /),  CR( 0,  r)  are  the  time-dependent 
concentrations  of  the  oxidized  and  reduced  forms, 
respectively,  at  the  electrode  surface,  k J  is  the 
standard  apparent  heterogeneous  charge  transfer 
rate  constant,  referred  to  the  formal  potential, 
E0/ ,  for  the  reaction 

0  +  we‘?=*R  (5) 

a  is  the  ‘charge  transfer  coefficient’,  /  =  F/RT  = 
38.9  V-1  at  25 °C,  E  is  the  electrode  potential,  i 
the  current  at  an  electrode  of  area  A,  and  n  the 
number  of  electrons  transferred  (eq  (5))  This 
formulation  ignores  the  effect  of  charge  on  the 
electrode  and  corresponding  charge  distribution  in 
solution.  For  an  elementary  process,  n  =  1  In 
general  it  is  found  that  even  for  more  complicated 
processes,  eqs  (2) -(4)  describe  the  experimental 
result,  although  the  value  of  n  in  eqs.  (3)  and  (4) 
may  be  less  than  that  in  eqs  (2)  and  (5)  (Elec¬ 
trons  transferred  after  the  rate-determining  step 
do  not  contribute  to  n  in  eqs  (3)  and  (4).)  A 
complete  description  of  a  mechanism  ideally  con¬ 
sists  only  of  elementary  steps  However  here,  for 
convenience  and  generality,  we  retain  the  symbol 
for  n,  and  do  not  distinguish  between  the  overall 
value  and  that  which  applies  to  the  rate-determin¬ 
ing  step. 

The  technique  of  normal  pulse  voltammetry 
leads  to  a  simple  closed-form  solution  to  the  diffu¬ 
sion  equation  under  linear,  semunfmite  conditions 
with  eqs  (2)-(4)  as  a  boundary  condition.  Thus 
quasircversible  charge  transfer  under  normal  pulse 
voltammetric  conditions  provides  a  straightfor¬ 
ward  example  of  the  types  of  questions  which 
arise  in  kinetic  analyses 

The  current  which  flows  in  response  to  the 
potential  perturbation  of  normal  pulse  voltamme¬ 
try  for  quasircversible  charge  transfer  is  given  by 

P) 

*(/)  “  +  <)~  V/2X/1/2  exp(A2/)erfc(Xr,/2) 

(6) 
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where 

A  =  k(1  *F  e)c-a 

(7) 

k  =  k°JD§-«vlDl/2 

(8) 

<  =  exp  [nf(E-E{n)} 

(9) 

£(/2  =  E°'  +  (lA/)In(Z>0//>R),/J 

(10) 

td-nFAl%*C3/(*t)x/i 

(11) 

D0  and  DR  are  the  diffusion  coefficients  of  the 
oxidized  and  reduced  species  (eq  (5)),  O  and  R, 
respectively,  the  initial  uniform  concentration  of 
O  is  Q) ,  that  of  R  is  zero,  and  t  is  the  time  after 
the  potential  is  applied  at  which  current  is  mea¬ 
sured  The  quantity  id  is  the  ‘diffusion-controlled 
current ,  the  maximum  current  which  can  be  ob¬ 
tained  under  these  conditions  A  typical  result 
conforming  to  eq.  (6)  is  presented  in  Fig  1 
Eq  (6)  provides  a  means  to  calculate  the  cur¬ 
rent  i(r)  at  any  potential  and  time,  given  the 
values  of  id,  E{/2,  a,  k°a,  D0,  and  DR  Typically 
i(t)  can  be  measured,  for  various  values  of  /,  over 
the  potential  range  of  interest.  The  objective  is  to 
see  whether  the  results  conform  to  eq  (6)  and  to 
obtain  the  values  of  the  kinetic  parameters,  a  and 

e 
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The  typical  experimental  procedure  is  as  fol¬ 
lows  First  note  that  as  A2f  oo,  eq  (6)  ap¬ 
proaches 

i(»)-H<l+c)-'  (12) 

and  when  this  is  true  (1  e ,  kf  and  kb  are  large  in 
comparison  with  the  rate  of  diffusion),  E  —  E[/2 
when  i(t)/id=*  1/2.  Thus  E[/2  is  measured  in  this 
way  using  data  from  an  experiment  at  times  suffi¬ 
ciently  long  that  the  kinetic  effect  is  negligible. 

When  this  regime  is  experimentally  inaccessi¬ 
ble,  E{/2  may  be  obtainable  through  measurement 
of  Eor  (eq.  (10))  This  is  done  by  preparing  a 
solution  containing  high  concentrations  of  both 
forms  of  the  redox  couple  (O  and  R,  eq.  (5))  and 
measuring  the  potential  of  an  inert  electrode  placed 
therein  This  route  also  has  problems,  in  that  the 
reduced  form,  R,  may  be  unstable 

When  c  is  small,  that  is,  E  E[/2>  i  attains  its 
limiting  value  of  id  (cq  (12))  Thus,  by  carrying 
out  the  expenment  at  sufficiently  negative  poten¬ 
tial,  id  is  measured. 

Depending  on  the  value  of  na,  the  approach  of 
/  to  the  value  id  may  be  very  slow,  and  a  slight 
increase  in  i  with  increasingly  negative  potential 
may  be  confused  with  unwanted  contributions  to 


POTENTIAL  VS-  SCE  (VOLTS) 

Fig,  1  Normal  pulse  \oItammetnc  reduction  of  0  99  mAf  Zn(H)  in  1  0  A/  NaNO,,  SMDE  (small  drop),  id  *  0  5  s.  /,  =  0  01  s  (u) 
Experimental  points,  ( - )  optimal  theoretical  curse  calculated  for  E{/2  «  -0971  V,  a  -  0  23,  log (k//2)  *  -0  81 
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Ihe  current  from  other  processes.  In  the  example 
of  Fig.  1,  the  current  has  not  reached  its  limiting 
value  at  the  most  extreme  potential 
The  third  step  is  to  obtain  the  current  as  a 
function  of  t  and  E  over  ranges  of  values  for 
which  the  kinetic  effect  manifests  itself.  Using  the 
data  from  these  three  steps,  the  quantity  i(E,  r)[I 
+  (\/‘d  ls  computed  for  each  experimental  cur¬ 
rent  From  eq  (6), 

'(£,  r)[l  +f]/ij“"1/2x  exp(x2)  erfc(x) 

"/(*)  (13) 

where  x  =  Af1/2.  Having  thus  obtained  values  of 
f(x),  the  function  is  inverted  to  obtain  values  of 
XtlA,  and  thus  A.  Comparing  eqs.  (3),  (7),  and  (8), 

kt**  DoAX/(l  +  c)  (14) 

Measurement  of  id  as  a  function  of  i  or  Q* 
allows  one  to  determine  DQ,  provided  n  and  A 
are  known  Thus  k,  can  be  calculated  The  quan¬ 
tity  k /(E)  is  then  plotted  as  a  function  of  E  to 
obtain  a  from  the  slope  according  to  eq.  (7),  and 
as  the  value  of  k,  at  £~£0'.  Note  from  eq 
(10)  that  this  also  requires  the  value  of  VR,  which 
may  be  difficult  to  measure  if  R  is  unstable. 

A  plot  of  it  vs  log  i  is  a  ’Tafel  plot’  (cf  eq 
(1)),  and  the  similar  plot  of  log  i  vs.  E  is  usually 
given  the  same  name.  By  extension,  the  plot  of  log 
kf  vs  £  is  a  ‘Tafel’  or  ‘Tafel-hke’  plot.  This 
scheme  for  obtaining  the  potential  dependence  of 
the  rate  thus  has  arisen  naturally  from  the  earliest 
empirical  observations. 


DATA  ANALYSIS 

The  measurement  errors  associated  with  this 
procedure  have  been  described  m  detail  (4).  Even 
without  considering  the  experimental  details,  it 
should  be  apparent  that  this  procedure  and  all 
other  procedures  which  are  similar  in  requinng 
normalizations  and  computation  of  kf  using  data 
from  different  experiments  arc  unsound.  In  par¬ 
ticular,  the  result  for  a  is  very  sensitive  to  the 
value  of  E{n. 

Consider  the  following  charactenstics  of  the 
functional  form.  First,  <<I0-J  for  n(E~  E(n) 


e.v 

Fig  2  Semiloganthmic  plot  ot  data  of  Fig  1  according  lo  cq 
(3)  with  vanous  choices  of  E{n  (V)  (o)  -0  971,  <o)  -0  981, 
(O)  -0966,  (a)  -0  961  These  potentials  are  indicated  by 
arrous  on  Ihe  figure  The  range  0  <09  is  also  indi- 

cated 


<  -120  mV  Thus,  as  i  approaches  id,  eq  (13) 
becomes  independent  of  c  Second,  for  large  x  » 
^ 1/2 >  fix)  (eq  (13))  is  insensitive  to  A/1/2.  For 
example,  for  x»2,  d/(x)/dx  =  0073  but  for  * 
83  10,  d/(x)/ dx  <=  0  00097.  In  the  range  of  large 
x,  small  errors  in  id>  which  cause  only  small  errors 
m  f{x\  result  in  large  errors  m  x,  and  thus  in  kf. 

Third,  (1  -f  <)  increases  exponentially  for  E> 
E\/i'  and  thus  small  errors  in  i  at  small  /  can 
cause  fix)  to  be  larger  than  its  maximum  value  of 
unity  When  this  is  a  problem,  the  analysis  is 
always  ‘improved’  by  choosing  a  more  positive 
value  of  E[n.  These  points  are  illustrated  in  Fig. 
2,  which  displays  the  data  of  Fig.  1  according  to 
the  scheme  presented  here  with  four  different  val¬ 
ues  of  E{/2>  the  optimal  value,  obtained  as  de¬ 
scribed  below,  and  both  smaller  and  larger  values 
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For  negative  potentials  of  about  -1.06  V,  all  of 
the  points  are  the  same,  because  !«1  Even  for 
the  optimal  value  of  E{/2,  the  value  of  lnJA/(l  + 
e)I  deviates  from  the  predicted  linearity  for  poten¬ 
tials  much  more  positive  than  E(/2,  because  small 
errors  in  E[/2  or  in  i  ate  magnified  by  the  large 
values  of  c  used  m  eq  (13).  More  positive  values 
of  E{/2  increase  the  range  of  linearity,  and  thus 
appear  to  be  ‘better’  values.  Conventionally  it  is 
felt  that  experimental  errors  may  dominate  out¬ 
side  of  the  range  01  <,/,j <0.9,  which  is  indi- 
cated  in  Fig  2. 

An  alternative  approach  to  the  analysis  of  voll- 
ammetnc  data  which  is  statistically  sound  has 
been  developed  and  described  in  detail  (5J  To 
explain  this  approach,  for  simplicity  tve  use  as  an 
example  the  kinetic  problem  just  discussed.  The 
model  yields  a  dimensionless  current  function,  ii 
here  (cf  eq.  (6» 


=  (1  + 1)  V'*A/1/2  exp(AV)  erfc(Arl/J) 

(15) 

Examining  eqs.  (15),  (7),  (8),  and  (9),  the  parame¬ 
ters  sought  are  identified  as  a,  k,  and  E{/}.  The 
experimental  currents  i(E,  l)  are  then  analyzed 
according  to  the  linear  equation 

i(£.l)-af(a,K,E{/2)  +  c  (U) 

by  finding  the  optimal  value  (5,  i,  £'  )  which 
maximizes  the  correlation  of  /  with  p  (or  mini- 
mtzes  the  complement  of  the  correlation  coeffi¬ 
cient,  (1  -  r)).  It  is  assumed  that  experimental 
errors  are  normally  distributed  with  zero  mean  It 
has  been  shown  that  this  procedure  is  equivalent 
to  the  method  of  maximum  likelihood. 

In  addition,  the  confidence  region  for  the  quan¬ 
tity  (a,  k,  e;/2)  is  determined  at  a  known  level  of 
confidence.  The  confidence  ellipsoid  may  be  de¬ 
scribed  by  the  intervals  /„,  /„,  /  ,  where,  for 
example,  /„  is  the  size  of  the  ellipsoid  m  the  a 
dimension  at  *=■!?,  E(/2^E(/2.  The  quantity 
has  endpoints  a'  and  a".  The  values  a'  and  «" 
are  the  values  of  a  that  lead  to  a  correlation 
coefficient  r2  - 1  -  b(l  -  rn)  when  the  correlation 
is  maximized  as  a  function  of  the  other  two 
parameters,  k  and  E[n.  The  interval  1„  is  not  a 
confidence  interval  for  a;  it  is  the  size  of  the 


confidence  ellipsoid  along  a  line  passing  through 
the  optimum  and  parallel  to  the  «-axis  The  value 
of  b  is  given  asymptotically  by  exp(xV»>)  -  (1  - 
rz  )/0  ~  rm)  When  rm  ~  1,  as  is  usually  the  case, 

b  =  exp(xV»r)  (17) 

where  m  is  the  number  of  experimental  points  and 
X  is  the  chi-squared  statistic  for  appropriate  level 
of  confidence  and  three  degrees  of  freedom. 


THE  COOL  ALGORITHM 

This  method  has  been  implemented  by  means 
Of  an  algorithm  (called  the  COOL  algorithm) 
which  incorporates  the  modified  simplex  al¬ 
gorithm  to  search  for  the  optimal  values,  and  the 
secant  algorithm  to  calculate  the  intervals  of  the 
confidence  ellipsoid  The  important  features  of  the 
procedure  in  applications  to  electrochemical  kinet¬ 
ics  are  as  follows. 

(i)  Tiie  treatment  is  independent  of  p  any  com¬ 
putational  technique  may  be  used  to  calculate 
any  y  for  any  model  for  use  with  the  COOL 
algorithm. 

(n)  The  data  are  not  transformed  or  manipulated 
prior  to  analysis 

(m)  No  normalizations  are  required;  in  particular, 
no  data  from  other  experiments  are  required 

(iv)  Offset  in  the  current  scale  does  not  introduce 
bias. 

(v)  All  of  the  data  are  used.  There  is  no  require¬ 
ment  that  the  experimenter  truncate  the  data 
at  some  point. 

(vi)  Confidence  regions  may  be  calculated 

Of  course,  there  are  other  examples  of  statisti¬ 
cally  reasonable  approaches  to  this  problem.  They 
arc,  however,  remarkably  sparse,  considenng  the 
considerable  mathematical  sophistication  required 
for  any  treatment  of  complex  kinetic  schemes 
studied  by  more  complex  voliammetnc  lech- 
mques.  This  general  issue  has  been  treated  re¬ 
cently  by  Rushng  (6).  From  the  point  of  view  of 
the  electrochemist, k  focusing  on  the  experiment, 
the  features  which  distinguish  ihis  approach  from 
those  which  arc  superficially  comparable  are  the 
following.  First,  the  COOL  algorithm  provides  a 
uniform  treatment  for  all  mechanisms  and  all  pulse 
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voltammetnc  expenments.  Second,  the  separation 
of  the  linear  and  nonlinear  parts  of  the  problem 
according  to  eq.  (16)  not  only  avoids  irritating 
expenmental  problems  (the  electrode  area  need 
not  be  known,  for  example),  but  is  also- efficient. 
Thus  interesting  problems  can  be  solved  in  ‘real* 
time’,  that  is,  times  no  longer  than  a  few  minutes 
Third,  perhaps  because  the  nonlinear  problem  is 
dealt  with  directly  rather  than  through  quadratic 
approximation  near  the  optimum  point,  the  appli¬ 
cation  is  surprisingly  robust.  The  experimenter 
needs  to  provide  only  initial  estimates  of  the 
parameters  and  the  step  sizes  for  the  initial  sim¬ 
plex.  Even  silly  initial  guesses  do  not  significantly 
slow  the  approach  to  the  optimum,  and  there 
seems  to  be  no  problem  with  false  optima  Thus  it 
is  a  useful  rather  than  a  dangerous  tool  m  the 
hands  of  a  naive  experimenter. 

Determination  of  ^  in  complex  cases 

Before  presenting  applications  to  kinetic  prob¬ 
lems  we  describe  briefly  the  techniques  employed 
to  obtain  the  dimensionless  current,  for  cases 
more  complicated  than  the  simple  example  of  eq 
(15) 

For  any  first  order  system  and  expenments 
with  only  stepwise  changes  in  potential  the  dimen¬ 
sionless  current  function  can  be  expressed  in  the 
form  of  an  integral  equation  as 

(18) 

where  0  and  are  functions  of  time.  This  is 
solved  numcncally  using  a  simple  linear  quadra* 
t  re  formula  to  yield  an  expression  of  the  form 

k =(*,(/,  a?) 

where  bm  is  the  estimate  of  ij-(r),  bt  is  the  estimate 
of  iKO  at  f  *  itp/I,  tp  is  the  time  over  which 
potential  is  held  constant,  l  is  the  number  of 
subintervals  employed  by  the  quadrature  m  the 
interval  tp ,  sj  mjx/2  -  (y  -  \)l/2,  and  y  “  m  -  /  + 
1. 


EXAMPLES  OF  QUESTIONS  ARISING  IN  DATA  ANAL¬ 
YSIS 

We  now  turn  to  the  discussion  of  three  exam¬ 
ples  of  questions  which  arise  in  the  analysis  of 
kinetic  data. 

(i)  For  slow  charge  transfer,  is  the  charge  trans¬ 
fer  coefficient  (a)  a  function  of  potential? 

(n)  For  cases  in  which  t..o  electrons  are  trans¬ 
ferred,  is  it  necessary  to  consider  both 
charge-transfer  rate  processes  in  the  model? 
(in)  For  two  charge  transfers  coupled  by  chemical 
reaction,  under  what  conditions  does  the 
chemical  cross  reaction  need  to  be  consid¬ 
ered? 

We  consider  each  of  these  questions  in  turn, 
keeping  in  mind  the  double  objective  of  elucidat¬ 
ing  mechanism  and  measuring  the  values  of  kinetic 
parameters 

Is  the  charge  transfer  coefficient  (a)  potential-de¬ 
pendent 9 

Modern  theories  of  adiabatic  charge  transfer 
predict  an  explicit  dependence  of  the  rate  on  such 
parameters  as  the  energy  of  reorganization  of  the 
molecule  m  going  from  the  initial  state  to  an 
excited  state  and  from  the  excited  state  to  the  final 
state  These  theories  predict  that  the  rate  of  charge 
transfer  should  depend  exponentially  on  a 
quadratic  function  of  potential.  By  examination  of 
eq.  (3)  it  can  be  seen  that  this  is  equivalent  to 
predicting  that  a  depends  linearly  on  potential 

By  comparing  the  theoretical  treatment  of 
Marcus  with  the  phenomenological  treatment  of 
eqs  (2)-(4)  (7),  one  obtains 

a  «  1/2  +  {nF/4\a){E  -  (20) 

m  which  \a  is  the  potential-independent  standard 
free  energy  of  activation  and  is  the  potential 
drop  across  the  diffuse  charge  layer  in  the  electro¬ 
lyte  solution  near  the  electrode.  The  expenmental 
objective,  then,  is  to  test  the  proposition  that  for 
an  appropriately  constrained  set  of  reactions  the 
quantity  a  of  eqs.  (3)  and  (4)  has  the  form  given  in 
eq.  (20).  It  should  be  explicitly  noted  that  eq.  (2) 
does  not  display  activity  coefficients.  Because 
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charge  transfer  necessarily  involves  change  in  net 
charge,  the  activity  coefficients  of  reactant,  prod¬ 
uct,  and  transition  state  will  in  general  be  differ¬ 
ent  Provided  that  they  are  potential  independent, 
activity  considerations  should  not  confound  ef¬ 
forts  to  test  relation  (20)  by  the  analysis  of  cur¬ 
rent-potential  data 

Consider  first  the  graphical  method  of  analysis 
based  on  eq  (6),  which  assumes  that  the  charge 
transfer  coefficient  is  independent  of  potential.  If 
instead  a  has  the  form  of  eq.  (20),  then  a  plot  of 
ln(&s)  vs.  E  according  to  eq  (3)  will  be  curved.  A 
common  way  of  using  this  method  to  test  eq.  (20) 
is  to  define  a  by 


«=-(W)d[ln((r,(£))]/d£  (21) 


Then  the  slope  of  the  curve  ln(fy(£))  vs.  E  is 
determined  numerically  to  give  values  of  a(E), 
which  are  then  plotted  against  E  to  test  eq  (20) 
and  obtain  the  value  of  the  coefficient  of  poten¬ 
tial. 

The  values  of  a  obtained  point  by  point  are 
obtained  from  the  curves  of  Fig.  2  according  to  eq. 
(21)  and  plotted  against  potential  as  shown  in  Fig 
3.  Clearly  the  result  for  E{/2  =  -0.961  V  is  ‘best’, 
that  is,  it  is  linear  over  the  range  0.1  <  i/id  <  0.9. 
By  choice  of  range  in  each  case,  a  slope,  9 (anf )/ 
9 E,  can  be  determined.  For  the  lines  in  Fig  3,  the 
slopes  are  261,  9.6,  and  21  V-2  for  E[/ 2  = 
—0.961,  —0.971,  and  -0.981  V,  respectively.  A 
predicted  value  in  this  case  is  70  V-2  (8)  which, 
considering  the  uncertainties  involved,  agrees  rea¬ 
sonably  well  with  the  value  of  26  V-2.  Thus,  a 


Fig.  *  Plot  of  a  values  obtained  from  Fig  2  by  means  of  eq  (7)  against  potential  to  test  eq  (20).  Symbols  as  Fig.  2,  range 
0.1  <  i/id<QS  shown,  and  E{/ j  values  indicated  by  arrows. 
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value  of  E[/2  which  is  in  error  on  the  positive  side 
(for  a  reduction)  not  only  improves  the  linearity  of 
the  result  but  also  yields  a  spurious  potential 
dependence  of  the  charge  transfer  coefficient. 

The  literature  on  this  question  is  confused. 
There  are  papers  desenbing  charge  transfer  coeffi¬ 
cients  which  are  potential-dependent.  Some  of 
these,  which  report  results  in  accord  with  theory, 
have  been  refuted.  These  potential  dependencies 
have  been  inferred  from  data  by  methods  similar 
to  those  described  above,  or  by  methods  some¬ 
what  more  sophisticated  but  containing  the  same 
fundamental  flaws  We  have  examined  this  ques¬ 
tion  in  detail  using  the  COOL  algorithm  for  analy¬ 
sis  of  data  (9)  Two  models  were  employed,  one 
equivalent  conceptually  to  that  desenbed  by  eq 
(15)  (but  incorporating  factors  to  take  into  account 
the  interfacial  charge  distnbution),  thus  having 
three  parameters,  and  an  alternative  one  with  the 
formulation 

a  =  a0  +  a,n/(E-E[/2)  (21) 


as  suggested  by  eq.  (20).  Typical  fits  according  to 
the  four-parameter  model  yielded  values  of  (1  - 
rm)  and  S/N  no  better  than  those  of  the  three- 
parameter  model,  with  a  typical  value  of  cq  = 
0.0002  ±  0.0002  (le.  4  =  0004).  (Here  S  is  the 
slope,  ay  of  eq.  (16)  and  N  is  the  root  mean 
square  deviation  of  the  experimental  points  from 
the  optimal  theoretical  curve.)  The  predicted  value 
of  a,  is  0013,  or  atn2f2  =  19  [8].  We  conclude 
that  a  does  not  depend  on  potential,  the  experi¬ 
mental  evidence  provided  by  these  authors 
notwithstanding. 

The  power  of  the  COOL  algorithm  in  this 
analysis  rests  in  part  on  the  identification  of  El/2 
as  a  parameter.  The  resilience  of  the  analysis  to 
changes  in  the  laboratory  reference  potential  is 
illustrated  in  Fig  4,  which  presents  results  for  four 
nominally  identical  experiments  Rather  than  just 
presenting  the  confidence  intervals,  a  more  exten¬ 
sive  calculation  was  employed,  to  compute  the 
boundary  of  the  confidence  region  m  each  of  the 
three  planes  of  the  parameter  space.  The  deviation 
of  the  optimal  value  of  E[/2  for  the  curve  of  panel 


Fig  4  Normal  pulse  voltammogram  for  1  m M  Zn(II)  in  03  M  KNO>,  SMDE,  medium  drop  size,  potentials  vs.  saturated  calomel 

electrode.  Panel  13  Expenmental  points  (O).  best-fitting  theoretical  cune  ( - )  and  residuals  (a).  Panels  14, 15, 16-  confidence 

regions  at  95%;  ( - -)  data  of  panel  13;  (•  >  ),  (  -•-•)•  (*-•-*  )  are  for  nominally  identical  expenments  Optimal  values 

(+)  Tbe  axes  are;  (13)  i(/t A)  vs  £(V),  (14)  fc*(10-,cm/s)  vs.  £,'/2(V);  (1$)  a  vs.  £{/ 2(V):  (16)  *°(10'3cm/$)  vs  a 
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-0996  -0993  -1  000  -1002  -1  004  0  200  0  250  0  300 

Fig  5  Normal  pulse  voltammogram  analyzed  with  independent  value  of  E[/2  Panels  5-8  are  equivalent  to  panels  13-16  of  Fig 
with  the  exception  that  E[/z  is  constrained  to  be  4  mV  positive  with  regard  to  the  optimal  value  found  in  Fig  4 


4, 


-0996  -0993  -1000  -1002  -1004  0.200  0250  63XT 


Fig.  6  Normal  pulse  voltammogram  analyzed  with  independent  value  of  E\/2  Panels  9-12  are  equivalent  to  panels  5-8  of  Fig  5, 
respectively,  with  the  exception  that  £j/2  is  constiained  to  be  4  mV  negative  with  regard  to  the  optimal  value  found  in  Fig  4 
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13  (of  Fig.  4)  is  a  systemic  error  caused  by  a 
change  in  the  laboratory'  reference  potential.  This 
can  be  seen  to  ha\e  no  effect  on  either  the  optimal 
values  of  the  other  parameters  or  the  size  and 
shape  of  the  confidence  regions. 

The  conventional  procedure  employs  an  inde¬ 
pendently  measured  value  of  E'xr~>  IKdm- 
effects  of  errors  in  this  value  on  the  analysis  can 


be  tested  by  analyzing  the  data  of  Fig.  4  by  means 
of  the  COOL  algorithm,  but  fixing  the  value  of 
E[f*.  In  Fig.  4  the  outlying  value  of  Ex/1  is  about 
4  mV  from  the  mean  value.  Thus  we  analyze  the 
data  for  one  of  the  nominally  identical  experi¬ 
ments  of  Fig.  4,  with  the  value  of  A*  fixed  at 
*h  Ef/2  -r  0.004  (V),  where  Erx/1  is  the  op¬ 
timal  value  found  in  the  optimization  presented  in 


-02 


-04’ _ _ _ , _ , _ , _ I 

-085  -095  -105  -1  15  -125  -135 

E.  V  VS  SSCE 

Fig  7  (a)  Normal  pulse  voltammogram  calculated  for  two-step  mechanism  wuh  -  6  6  x  10~6  cm2  s_1.  Z>R  —  1 6  x  10" 5  cm2 
s“ \  E° '  *  -0  $91  V,  k  -  3  5  X  10"  5  cm  s~  \  kp2  *  7 1  X  10-2  cm  s_  \  a,  -  a2  «  0  40  (O),  optimal  theoretical  curve  for  one-step 

mechanism.  Ex/2  m  -0SS66  V,  an  -  0  404,  k*  -  3  51  x  10-5  an  s" 1  ( - ),  (b)  residuals,  where  is 

the  current  calculated  for  the  two-step  mechanism,  and  the  current  for  the  optimal  one-step  mechanism  tp  «  5  ms 
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Fig.  4.  Hie  value  is  fixed  by  setting  the  initial  step 
see  to  aero.  The  result  is  shown  in  Fig.  S.  The 
optimal  theoretical  curie  (calculated  from  eq.  (16) 
using  (o.  £,  (£,'  ,)„))  now  displays  natidble  sys¬ 
tematic  variation  from  the  experimental  result  (Fig. 
5.  panel  5).  In  addition,  the  confidence  regions 
about  (a.  £.(£{/ 4**  substantially  unsymmet- 
ricaL  Similar  results  are  obtained  when  E(/2  « 
fixed  at  the  value  =  Efe  —  0.004  (V),  as 


shown  in  Fig.  6.  The  change  in  optimal  value  of 
resulting  from  change  in  E[y>  is  expected,  for 
k°  is  just  the  rate  constant  at  £=  E°'  (cf.  eqs. 
(3).  (4),  and  (10)).  More  striking  is  the  large  change 
in  a.  This  demonstrates  that  is  properly  a 
parameter  of  the  experiment,  and  thus  fixing  £>/t 
at  some  value  determined  in  another  experiment 
precludes  the  possibility  of  accurate  kinetic  analy¬ 
sis. 


E .  v  vs  SSCE 

Fig.  8  As  Fig.  7,  but  points  arc  for  an  experimental  voltammogram  obtained  under  conditions  nearly  identical  to  those  which  yielded 
the  values  of  rate  parameters  for  the  two-step  mechanism. 
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Is  if  necessary  to  consider  more  than  one  charge 
transfer? 

In  the  case  discussed  above  [9]  a  second  issue 
involves  the  detailed  mechanism  of  charge  transfer 
when  n  =  2.  Is  it  necessaiy  in  that  case  to  use  the 
model  incorporating  two  successive  charge  trans¬ 
fers  and  thus  six  rate  parameters  [4]?  Or,  inverting 
the  question,  can  anything  be  learned  about  the 
faster  of  the  two  steps  by  this  type  of  analysis? 
The  issue  here  is  more  complex,  for  the  more 
elaborate  mechanism  can  be  expected  to  exhibit 
non-monotonic  changes  in  the  shape  of  the  re¬ 
sponse,  and  thus  to  produce  a  non-random  pat¬ 
tern  of  residuals  However,  systematic  errors  in 
the  experiments  can  have  the  same  effect.  Thus 
the  non-random  distribution  of  residuals  cannot 
be  attributed  automatically  to  significant  rather 
than  adventitious  or  trivial  failures  of  the  model. 
A  further  problem  arises  when  the  model  is  availa¬ 
ble  only  numerically  (cf.  eq.  (19)).  The  interpreta¬ 
tion  of  humps  or  bumps  m  the  response  as  arising 
from  specific  features  of  mechanism  (the  phreno- 
logic  school  of  kinetics),  always  risky,  is  foolhardy 
in  this  case,  as  minor  changes  in  the  values  of 
parameters  can  produce  quite  striking  changes  in 
the  appearance  of  the  response 

A  typical  illustration  is  given  in  Figs  7  and  8. 
Fig  7  displays  the  analysis  of  a  calculated  voltam- 
mogram  The  voltammogram  was  calculated  from 
five  parameters  for  two,  one-electron  transfers. 
Both  rate  constants  are  referred  to  the  standard 
potential  for  the  overall  two-electron  process  The 
calculated  voltammogram  was  then  analyzed 
according  to  a  model  for  a  single  slow  electron 
transfer  with  n  «  2  (three  parameters).  The  obvi¬ 
ous  pattern  in  the  residuals  can  be  compared  with 
those  of  the  experimental  example  of  Fig.  8  The 
experimental  conditions  of  Fig  8  are  nearly  iden¬ 
tical  to  those  which  produced  the  data  on  which 
the  theoretical  calculation  of  Fig  7  (5  parameters) 
is  based  In  the  experiment,  noise  and  systematic 
experimental  artifacts  obscure  the  interpretation. 
The  pragmatic  conclusion  is  that  the  more  simple 
model  explains  adequately  the  variance  in  the 
data  This  leaves  open  the  question  of  whether  the 
data  contains  information  about  the  faster  elec¬ 
tron  transfer  step  This  might  be  obtained  by 


analyzing  the  data  of  Fig.  8  according  to  the 
appropriate  model  for  two  successive  slow  elec¬ 
tron  transfers. 

Is  it  necessary  to  consider  more  than  one  homoge¬ 
neous  reaction? 

A  much-studied  mechanism  is  the  so-called 
ECE  sequence 

Oi  +  /i,e?2  R,  £,0/  (22) 

R.-^O,  (23) 

01  +  n#&R2  Ef'  (24) 

in  which  the  heterogeneous  charge  transfers  are 
linked  by  an  intermediate  homogeneous  reaction, 
here  taken  to  be  irreversible.  When  Ef ' »  Ef ', 
reaction  (24)  is  more  favored  than  (22),  and  so  the 
two  reactions  occur  together  at  the  potential  for 
reduction  of  O,.  The  reason  for  interest  in  this 
scheme  is  its  potential  catalytic  significance.  Many 
organic  compounds,  especially  in  aqueous  solu¬ 
tion,  display  the  response  expected  for  this  sort  of 
mechanism  However,  when  Ef' »  £,°',  the  ho¬ 
mogeneous  reaction 

02  +  (nj//i,)R,  BO,  +  (nj/n1)R2  (25) 

is  highly  favored  and  provides  an  alternative  route 
to  that  of  eq  (24)  for  the  transfer  of  electrons  to 
R2.  The  questions  then  anse,  under  what  condi¬ 
tions  is  reaction  (24)  important  in  the  overall 
process,  and  when  it  is  important,  can  it  be  de¬ 
tected?  Or  to  phrase  the  question  somewhat  differ¬ 
ently,  under  what  conditions  does  the  model  con¬ 
sisting  of  reactions  (22)-(24)  explain  adequately 
the  response? 

A  classical  example  is  the  reduction  of  p-nitro- 
sophenol  [10]  Experimental  results  for  p-mtro- 
sophenol  are  presented  in  Fig  9  together  with  the 
optimal  theoretical  curves  for  the  simple  model 
composing  eqs.  (22)-(24).  To  the  eye  it  would 
appear  that  the  correspondence  is  adequate  For 
these  data  rm  =  0.998,  and  typical  values  of  Ik  arc 
0.2-1  s"\  depending  on  the  experimental  condi¬ 
tions.  There  is  considerable  advantage  in  using 
this  method  to  determine  values  of  k ,  for  the 
addition  of  the  second  order  reaction,  eq.  (25),  to 
the  model  complicates  the  mathematical  formula- 
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POTENTIAL  V$  SC€  1V0LTS) 

Fig.  9  Forward,  reverse,  and  net  experimental  currents  (O)  and  optimal  theoretical  curves  ( - >  for  square  wave  voltammetnc 

reduction  of  ^-mtrosophenol  in  20%  (v/v)  aqueous  ethanol,  0 1  M  acetic  acid,  0 1  M  potassium  acetate,  0  1  M  potassium  nitrate, 
0  0005%  Tnton  X-100  E{/2  -  -0 1334  V,  H  ~  2  09  s~\  <2  - 0 


tion  enormously.  Are  the  optimal  values  of  k  and 
the  associated  confidence  regions  reliable,  even  if 
the  model  is  ‘wrong’  in  that  it  does  not  incorpo¬ 
rate  eq  (25)? 

Unfortunately  there  does  not  appear  to  be  a 
general  answer  to  this  question,  even  if  it  is  re¬ 
stricted  to  problems  involving  only  two  parame¬ 
ters  For  the  case  of  Fig  9  there  is  reason  to 
believe  on  empirical  grounds  that  this  question 
has  an  affirmative  answer  [11].  This  is  an  unsatis¬ 
factory  conclusion,  m  that  it  relies  on  an  intuitive 
argument  based  on  example,  rather  than  on  objec¬ 
tive  criteria.  The  present  statistical  approach  deals 
only  with  the  description  of  phenomena,  and  thus 
cannot  deal  directly  with  questions  of  this  type.  It 
could  be  a  useful  tool,  however,  for  computational 
investigations  of  this  and  related  questions.  Al¬ 
though  the  results  could  only  serve  as  a  guide, 
computation  is  so  much  less  expensive  than  ex¬ 
perimentation  that  this  could  well  be  the  most 


efficient  way  to  proceed  with  interpretation  of 
kinetic  measurements 


CONCLUDING  REMARKS 

These  three  examples  raise  issues  commonly 
addressed  ad  hoc  and  qualitatively  in  electrochem¬ 
ical  kinetic  studies  The  optimization  technique 
presented  here  provides  a  rigorous  evaluation  of 
the  correspondence  between  model  and  data  in 
near-real-time.  This  may  be  used  to  discriminate 
between  alternative  models  and  to  examine  the 
power  of  the  data  to  yield  mechanistic  informa¬ 
tion. 

In  favorable  cases,  the  algorithm  may  be  used 
to  identify  and  quantify  a  minor  feature  of  the 
mechanism.  Equally  important,  and  more  difficult 
to  demonstrate  convincingly,  this  approach  may 
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be  used  to  show  the  absence  of  an  effect  Finally, 
after  suitable  computational  investigation  of  vari¬ 
ous  types  of  models,  it  may  permit  one  to  treat 
rather  complex  cases  using  the  most  simple  model 
which  incorporates  the  feature  about  which  infor¬ 
mation  is  sought  and  which  yields  an  acceptable 
signal-to-noise  ratio  ( S/N ), 
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Abstract 


Simpson,  D  G  .  Guo,  S  .  Sacks.  J .  Bietz.  J  A  ,  Huebner  and  F .  Nelsen,  T .  1991  Relating  chromatographic  data  to  measurements  of 
wheal  quality  case  studies  in  dimension  reduction  Chemometrics  and  Intelligent  Laboratory  Systems.  10  155-167 

fractionating  wheat  proteins  by  reversed  phase  high-pcrformancv  liquid  chromatography  yields  extremely  complex  chromato* 
gums  The  data  they  contain  may  relate  to  many  characteristics  of  milted  wheat  such  as  the  volume  of  a  loaf  of  bread  or  the  texture 
of  the  dough  produced,  but  such  relationships  are  not  readily  apparent  from  the  raw  data  We  report  out  experiences  with  two 
dimension  reduction  techniques  that  arc  widely  cited  in  the  chemometrics  literature  principal  component  analysis  and  paitial  least 
squares  (PLS)  Each  of  these  methods  replaces  the  original  observation  vectors  by  weighted  avciages  of  then  components,  where  the 
weights  arc  selected  according  to  a  data  dependent  criterion  The  analysis  proceeds  by  opeiatmg  on  these  weighted  avciages  ulhei 
than  the  original,  high-dimensional  data  In  order  to  elucidate  properties  of  significance  tests  and  other  inferences,  we  focus  on  the 
special  case  where  only  one  factor  is  selected  Wc  show  how  to  use  simulation  to  compute  the  appropriate  significance  level  of  the 
regression  on  the  PLS  scores  The  common  technique  of  using  the  F  distribution  to  compute  significance  levels  for  PLS  regression 
can  be  an  extremely  liberal  procedure  The  interpretation  of  PLS  weights  requires  considerable  care 


introduction  tc,ns  from  samples  of  wheat,  there  is  considerable 

interest  in  developing  the  statistical  technology  for 
With  the  advent  of  modern  high-performance  relating  these  chromatographic  fingerprints  to  the 

liquid  chromatography  (HPLC)  for  analyzing  pro-  attributes  of  milled  wheat  [1]  Viewing  a  wheat 
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sample  as  the  basic  experimental  unit,  it  is  typical 
that  the  number  of  independent  observations 
(wheat  samples)  is  small,  but  the  number  of  char¬ 
acteristics  available  for  study  on  each  observation 
is  large.  For  instance,  there  appears  to  be  a  multi¬ 
tude  of  active  sites  on  the  chromatogram  that 
might  potentially  be  included  in  a  model  for  pre¬ 
dicting  various  attributes  of  the  milled  wheat. 
Standard  statistical  methodology,  e.g.  multiple  lin¬ 
ear  regression,  cannot  be  applied  directly  to  the 
raw  data  because  the  nominal  dimension,  that  is, 
the  number  of  measurements  on  each  experimen¬ 
tal  unit,  exceeds  the  number  of  independent  ob¬ 
servations,  leading  to  ill-posed  estimation  prob¬ 
lems. 

Dimension-reduction  techniques  arc  based  on 
the  premise  that  much  of  the  information  col¬ 
lected  on  each  observation  is  redundant,  and  that 
some  lower-dimensional  transformations  of  the 
data  contain  most  of  the  information.  If  such 
transformations  can  be  discovered,  then  one  can 
m  principle  use  standard  statistical  methodology 
on  the  constructed  lower-dimensional  data.  Two 
dimension-reduction  methods  that  are  widely  cited 
in  the  chemometncs  literature  are  principal  com¬ 
ponent  analysis  (2J  and  partial  least  squares  (PLS) 
(31  After  describing  these  methods  briefly,  we 
illustrate  their  use  on  typical  wheat  protein  chro¬ 
matographic  data,  and  offer  some  preliminary  ob¬ 
servations  on  the  viability  of  these  methods  for 
investigating  the  relationships  between  HPLC  pat¬ 
terns  and  attributes  of  milled  wheat. 

In  principal  component  regression  the  predictor 
variables  are  reduced  to  a  smaller  number  of  pro¬ 
jections  that  account  for  most  of  their  variation 
(2)  Because  the  projections  are  selected  indepen¬ 
dently  of  the  response  variable,  this  procedure  has 
the  advantage  that  classical  regression  theory  may 
be  applied  to  test  for  significance,  to  compute 
prediction  intervals,  and  so  on  On  the  other  hand, 
there  is  no  guarantee  that  the  principal  component 
projections  contain  adequate  information  about 
the  relation  between  the  predictor  variables  and 
the  response.  PLS  has  been  proposed  as  a  method 
for  selecting  projections  that  are  more  informative 
about  the  relationships  between  two  sets  of  vari¬ 
ables.  It  makes  use  of  the  covariances  to  select 


projections  that  account  for  the  joint  variation  in 
the  two  sets.  PLS  regression,  in  particular,  selects 
one-dimensional  projections  of  the  predictor  vari¬ 
ables  that  have  large  covariance  with  the  response 
(4,5).  Because  the  projections  depend  on  the  re¬ 
sponse  as  well  as  the  predictor  variables,  classical 
regression  theory  does  not  strictly  apply.  For  in¬ 
stance,  we  demonstrate  that  companng  the  PLS  F 
test  for  the  regression  to  the  F  distribution  can  be 
an  extremely  liberal  procedure. 

Both  the  pnncipal  component  projections  and 
the  PLS  projections  are  affected  by  the  choice  of 
scales  for  the  different  components  of  the  raw 
data.  Changing  the  scales  differentially  can  drasti¬ 
cally  change  the  nature  of  the  projections  selected 
For  this  reason  many  authors  suggest  standardiz¬ 
ing  the  raw  data  componentwise  prior  to  further 
analysis.  In  our  examples  we  center  but  do  not 
standardize,  because  the  HPLC  measurements  at 
different  sites  on  the  chromatogram  are  m  the 
same  unit,  and  a  change  of  units  would  affect 
them  all  simultaneously  Principal  component  and 
PLS  factors  are  unaffected  by  common  scale 
transformations  of  the  raw  components  of  the 
data,  e  g.  the  results  would  be  the  same  if  we  chose 
to  express  absorbance  in  different  units  Applying 
a  nonlinear  transformation  (e  g.  a  logarithm)  docs 
affect  the  results,  and  the  selection  of  an  ap¬ 
propriate  transformation  is  an  issue  for  further 
research.  Such  preprocessing  of  the  data  is  often 
an  important  ingredient  to  the  success  of  a  dimen¬ 
sion-reduction  technique  (6J. 

There  arc  different  versions  of  PLS  and  differ¬ 
ent  recommendations  about  how  to  choose  the 
number  of  projections  for  regression  (7)  Our 
primary  interest  is  in  how  to  interpret  the  projec¬ 
tions  and  in  how  to  make  inferences.  For  this 
reason  we  sidestep  the  other  issues  and  focus  on 
the  special  case  where  only  one  PLS  projection  of 
the  predictor  variables  is  to  be  selected.  In  our 
regression  example  this  seems  appropriate.  Al¬ 
though  each  observation  has  many  components, 
there  arc  few  observations,  and  one  explanatory 
variable  ought  to  be  sufficient.  The  important 
issue  of  bias  due  to  variable  selection  is  clearly  of 
broader  scope,  and  our  case  study  may  be  viewed 
as  a  telling  example. 
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PRINCIPAL  COMPONENTS 

Principal  component  analysis  is  a  method  of 
investigating  a  multivariate  dataset  by  looking  at 
orthogonal  one-dimensional  projections  (2j.  By 
multivariate  we  mean  that  each  experimental  unit 
has  a  number  of  measurements  associated  with  it. 
For  instance,  a  given  sample  of  wheat  might  be 
subjected  to  several  different  assessment  of  qual¬ 
ity,  in  which  case  the  different  quality  measure¬ 
ments  constitute  different  components  of  the  mul¬ 
tivariate  quality  vector  for  that  sample.  Similarly, 
the  HPLC  pattern  might  consist  of  absorbance  at 
50  equally  spaced  points  on  the  time  scale,  in 
which  case  the  50  measurements  comprise  a  50-di- 
mensional  vector  associated  with  the  given  wheat 
sample.  The  usual  goal  in  principal  component 
analysis  is  to  replace  the  large  number  of  compo¬ 
nents  on  the  original  scale  with  a  small  number  of 
new  components  consisting  of  the  orthogonal  pro¬ 
jections  that  account  for  the  largest  portion  of  the 
variation  in  the  dataset  at  hand. 

A  direction  vector  is  a  vector  of  unit  length, 
where  the  length  of  an  arbitrary  vector  x  = 
(*,,«.  %xpY  is  given  by 

||*||  -vk5*  »  /;?+  •••+*’ 

If  ||  jr||  *  0,  then  u  -  jr/||  -r  ||  is  the  direction  vec¬ 
tor  for  x.  If  y  is  anther  vector  with  the  same 
number  of  components,  then  its  projection  on  x  is 
y'x  **!+••  +ypxp 

ii*ii  “  ii*ii  ayu 

The  number  y'u  is  the  component  of  y  in  the 
direction  of  u.  For  example,  suppose  u  is  the 
direction  vector  (1, 0, 0, 0,.,.,0)'.  Then  y*u**yl% 
the  first  component  of  y. 

A  key  idea  in  dimension-reduction  is  the  pro¬ 
jection  of  a  dataset.  Suppose  a  dataset  consists  of 
n  vectors  xx, . . . ,  x„  each  having  p  components. 

*,=(*, . *,,) .  <=i. 

Projecting  each  of  these  vectors  on  a  direction 
vector  u  yields  a  new  dataset  of  cne-dimcnsional 
observations,  the  components  of  xx,  .,x„  in  the 
direction  of  w: 
x'xu,...,  x'„u 


Given  a  set  of  numbers  {*„  jc2>- ••>*«}*  a 
common  measure  of  variation  is  the  sample  vari¬ 
ance  about  the  mean: 

k*. . 

where 

_  v,  +  ••  +x„ 

x  =  — - - - 

n 

The  first  principal  component  is  obtained  by  find¬ 
ing  the  direction  «,  such  that  the  projection  of  the 
dataset  has  maximal  sample  variance,  that  is, 

V{x[uv,  ..,x ^«j)=  max  V(x[u,  ,  x'k) 

IMI-l 

The  second  principal  component  is  obtained  by 
maximizing  the  variance  of  the  projections  on 
directions  orthogonal  to  In  general,  the  fcth 
principal  component  maximizes  the  variance  of 
the  projections  on  directions  orthogonal  to 

(“l . "A-l)- 

In  using  this  construction  for  dimension-reduc¬ 
tion  the  hope  is  that  most  of  the  relevant  variation 
is  accounted  for  by  the  first  few  principal  compo¬ 
nents.  For  instance,  it  might  be  that  most  of  the 
variation  in  a  set  of  chromatograms  is  accounted 
for  by  a  few  peaks. 

A  number  of  software  packages  and  programs 
have  routines  for  principal  components  analysis 
including  BMDP,  Minitab,  SAS,  and  Unscram¬ 
bler  In  addition,  programs  that  perform  the  ei¬ 
genvalue  decompositions  needed  to  get  the  prin¬ 
cipal  components  arc  widely  available,  c.g.,  LIN- 
PACK  and  S. 


PLS  PROJECTIONS 

Principal  component  analysis  attempts  to  pro¬ 
duce  a  small  number  of  directions  that  capture 
most  of  the  variation  in  a  single  set  of  vector 
observations  An  alternative  dimension-reduction 
has  been  proposed  in  the  chemometrics  literature 
when  the  goal  is  to  relate  two  sets  of  vectors. 

Given  pairs  of  vectors  (jc„  j’j) . (*„,  the 

idea  is  to  find  directions  «,  and  t>,  such  that  the 
projections  of  .,  x„  on  «,  and  the  projections 
of  y„  on  t>,  have  large  coincident  variation. 
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This  is  the  basis  of  the  PLS  algorithm  J3],  which 
uses  the  projections  on  these  directions  as  the 
input  variables  for  least-squares  regression. 

Specifically,  for  pairs  of  numbers  (x,,  j’, 

(x„»  >»)  the  sample  covariance  is  given  by 

E  (x,-S)(yj-y) 

C(x„...,x„;y . >'„)  =  — - - - 

and  provides  a  measure  of  the  extent  to  which  the 
x  and  >  values  tend  to  vary  together.  PLS  uses 
the  covariance  as  a  criterion  for  selecting  the 
projection  directions  and  vv 

C(  x[ux , . . . ,  *>, ;  y{vt . y' o, ) 

■  max  max  C(x(v . x'u ,  \\v, ....  i>T>) 

As  in  principal  component  analysis,  one  can  iterate 
the  procedure  and  select  additional  direction  vec¬ 
tors  that  maximize  the  covariance  in  directions 
orthogonal  to  previously  selected  projections.  PLS 
has  almost  invariably  been  described  in  algorith¬ 
mic  form,  but  Frank  (4)  and  Hoskuldsson  (5)  have 
pointed  out  that  the  algorithm  selects  covariance 
maximizing  directions. 

PLS  provides  a  simultaneous  dimension-reduc¬ 
tion  for  x  and  y.  For  the  special  case  with  either  x 
or  y  one-dimensional  the  solution  can  be  written 
down  explicitly  Suppose  a  scalar,  for  i  => 

1 . n.  Then  the  solution  is  given  by 

E  (%-?)(*,-*) 

j-  i 

„i  =  _ - -  „1  =  1 

I  E  (>•*-  ?)(*»-*) 

where  x  is  the  vector  of  componentwise  sample 
means  for  zx>  ,  xn.  In  this  case  u,  may  be  recog¬ 
nized  as  the  direction  of  the  vector  of  slopes  from 
the  least-squares  regression  of  x  on  >•.  In  general 
the  PLS  algorithm  is  easily  programmed.  It  has 
been  implemented  in  the  program  Unscrambles 
which  ts  available  for  IBM-PC  compatibles.  We 
have  programmed  PLS  regression  in  S  and  FOR¬ 
TRAN. 

PLS  bears  a  resemblance  to  canonical  correla¬ 
tion  analysis  (CCA),  in  which  projections  of 


Xj,  ..,x„  and  >’j,  y„  are  selected  to  maximize 
correlation  [8].  The  CCA  directions  are  the  ones 
with  the  strongest  linear  association  for  the  data  at 
hand,  whereas  the  PLS  directions  have  the  highest 
coincident  variation.  Unfortunately,  CCA  is  ill- 
posed  in  the  present  setting  where  the  nominal 
dimension  of  the  data  exceeds  the  number  of 
independent  observations.  One  can  achieve  perfect 
sample  correlation  by  weighting  on  any  n  —  1 
linearly  independent  columns  of  the  data  matrix. 

REGRESSION  ON  CONSTRUCTED  COMPONENTS 

Consider  the  case  where  y  has  only  one  compo¬ 
nent,  whereas  x  is  of  high  dimension.  This  is  the 
case  in  the  examples  below,  where  y  is  a  particu¬ 
lar  attribute  of  milled  wheat  and  x  is  the  HPLC 
determination  of  protein  composition  Recall  that 
the  regression  of  y  on  x  is  ill-posed  if  the  number 
of  components  of  x  exceeds  the  number  of  ob¬ 
servations  PLS  attempts  to  circumvent  this  prob¬ 
lem  by  regressing  y  on  the  linear  combinations  of 
x  selected  according  to  the  maximum  covariance 
criterion.  Similarly,  principal  component  regres¬ 
sion  involves  regressing  y  on  the  linear  combina¬ 
tions  of  x  selected  by  principal  component  analy¬ 
sis.  In  each  case  one  uses  the  constructed  variables 
2t  =  *'«„  22~x  u2'  an(*  so  on  as  the  regression 
variables  for  predicting  y.  In  the  case  of  principal 
components  the  ordinary  theory  of  multiple  linear 
regression  can  be  used  to  compute  standard  errors 
and  prediction  intervals,  because  no  information 
about  y  was  used  in  the  construction  of  z5,  z2, 
etc.  In  the  case  of  PLS  the  usual  theory  is  inap¬ 
propriate.  because  of  the  dependence  of  the  con¬ 
structed  z,,  z2,  . .  on  y.  Further  discussion  of  this 
point  is  given  below.  It  is  clear  that  ordinary 
principal  component  regression  can  fail  if  the  lin¬ 
ear  combinations  of  x  with  the  largest  variability 
have  little  relation  to  >•.  PLS  is  an  attempt  to 
avoid  this  pitfal!  by  selecting  linear  combinations 
that  vary  together  with  y. 

A  CLASSIFICATION  EXAMPLE 

The  first  example  is  a  dataset  consisting  of 
HPLC  runs  of  43  samples  of  durum  wheat.  There 
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are  two  groups  labeled  ‘42’  and  ‘45’  depending  on 
which  of  two  proteins  is  present  at  a  certain  locus 
on  the  chromosome,  as  determined  by  electro¬ 
phoresis  It  has  been  found  that  the  presence  of 
protein  ‘42’  indicates  a  variety  with  weak  pasta 
quality,  whereas  protein  ‘45’  indicates  strong 
variety.  This  example  offers  a  test  case  for  whether 
the  dimension-reduction  techniques  can  ‘discover’ 
this  relationship.  The  experimental  technique  for 
the  HPLC  is  described  in  ref.  1. 

Figs.  1A  and  B  show  the  chromatograms  (ab¬ 
sorbance  versus  time)  for  the  group  42  and  group 
45  samples.  Each  chromatogram  contains  330 
equally  spaced  Measurements  over  the  range  5-60 
min.  The  most  striking  difference  is  that  the  group 
42  samples  have  a  sharp  peak  at  49  min  that  is 
absent  from  the  group  45  samples.  Conversely, 
group  45  has  a  large  peak  at  44  min  that  is  absent 
in  group  42.  Presumably  this  difference  in  HPLC 
results  for  the  two  groups  is  a  reflection  of  the  two 
proteins  identified  by  electrophoresis,  Burnouf  and 
Bictz  (1J  cited  it  as  evidence  that  HPLC  could  be 
used  to  identify  strong  and  weak  vaneties.  There 
is  a  minor  peak  evident  at  18  mm  for  group  45  but 


not  for  group  42.  This  peak  was  present  only  in 
five  analyses  of  one  variety  (Langdon),  so  its 
appearance  in  group  45  seems  coincidental. 

As  the  difference  between  the  two  groups  is 
obvious  in  Fig  1,  any  reasonable  procedure  ought 
to  be  able  to  recover  it.  We  employed  composite 
classification  rules  in  which  we  first  selected  one 
or  two  orthogonal  weight  vectors  by  principal 
components  or  PLS,  and  then  applied  Fisher’s 
linear  discriminant  rule  [9}  to  the  scores  obtained 
by  projecting  the  data  on  the  weight  vectors.  The 
effect  of  this  composite  rule  is  to  select  a  single 
direction  vector,  say  w,  that  is  a  linear  combina¬ 
tion  of  the  original  direction  vectors  selected  by 
pnncipal  component  analysis  or  PLS  The  com¬ 
posite  discriminant  rule  is  equivalent  to  assigning 
a  candidate  chromatogram  to  the  group  whose 
mean  projection  on  w  is  closest  to  its  own. 

Fig.  2A  shows  the  first  two  eigenvectors  from 
principal  component  analysis.  Fig.  2B  shows  the 
first  PLS  weighting  vector  and  the  weighted  aver¬ 
age  of  the  first  two  principal  components  selected 
by  the  two-dimensional  PC  linear  discriminant. 
The  components  of  a  weight  vector  u  = 
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Fig  2  Weight  vectors  for  centered  chromatograms  of  Durum  wheat  samples  (A)  Hrst  two  eigenvectors  from  PCA,  (B)  PLS 
projection  and  linear  discriminant  projection  based  on  first  two  principal  components 


(«,, .  . ,  u330)'  give  the  weights  for  the  time-ordered 
sites  on  the  chromatogram  in  the  constructed  vari¬ 
ables 

Z,  ®  x'tU  =  If,*,  +  U2X2  +  *  •  +  Wj30x3.V) 

As  described  above,  the  principal  component 
weights  do  not  use  the  classification  information, 
but  simply  give  the  direction  of  the  most  variable 
projection  of  the  chromatograms.  On  the  other 
hand  the  PLS  weights  give  the  projection  direction 
having  the  largest  covariance  with  the  group  labels, 
coded,  for  instance,  as  Os  and  Is  (If  there  were 
more  than  two  groups  wc  would  have  to  introduce 
a  vector  of  binary  variables  for  group  labeling.)  It 
can  be  shown  that  if  y  is  binary  the  PLS  weight 
vector  is  simply  the  direction  of  the  difference 
between  »he  componentwise  averages  for  the  two 
groups,  in  t he  present  case,  the  difference  between 
the  mean  chromatograms  for  the  two  groups. 

The  first  principal  component  weights  in  Fig. 
2 A  appear  to  confound  the  two  peaks  noted  above 
with  several  other  sites  on  the  chromatogram.  The 
second  component  appears  to  cancel  out  most  of 
the  other  sites,  allowing  us  to  recover  the  dif¬ 


ference  between  the  two  main  peaks  of  interest 
with  a  bivariate  linear  discriminant  It  is  clear 
from  Fig  2B  that  the  PLS  factor  is  weighting 
primarily  on  the  difference  between  the  two  major 
peaks  noted  previously.  The  weighting  vector  that 
results  from  applying  the  bivariate  linear  discrimi¬ 
nant  to  the  first  two  principal  components  is  simi¬ 
lar  to  the  PLS  weighting  vector  except  that  the 
former  gives  more  weight  to  gliadins  eluting  be¬ 
yond  50  min. 

Fig  3  is  an  indication  of  the  effectiveness  of 
the  constructed  classification  variables  The  verti¬ 
cal  axis  is  the  group  label.  The  horizontal  axis  is 
the  value  of  the  score,  =  x'u,  for  each  of  the  43 
samples.  In  each  plot  the  vertical  line  is  the  cutoff 
value  for  the  linear  discnmant  rule,  which  is  given 
by  (r,  +r2)/2.  where  5,  and  z2  are  the  mean 
scores  for  the  two  groups.  The  first  principal  com¬ 
ponent  scores,  shown  in  Fig.  3B,  are  not  very 
effective  for  classifying  the  two  groups.  Adding 
the  second  component  reduces  the  error  date 
dramatically.  The  PLS  scores,  shown  in  Fig  3C, 
provide  a  complete  separation  of  the  two  groups 
The  apparent  error  rates  and  leavc-one-out  cross- 
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validation  (CV)  estimates  of  the  error  rates  [10] 
are  as  follows. 


Method 

Apparent 

CV 

error 

error 

rate 

rate 

Principal  component  analysis  (I) 

12/43 

13/43 

Principal  component  analysis  (11) 

1/43 

2/43 

PLS 

0/43 

1/43 

The  apparent  error  rate  is  known  to  be  optimistic, 
the  CV  estimate  is  generally  considered  to  be  more 
reliable. 

When  only  one  PLS  projection  is  selected, 
applying  the  linear  discriminant  rule  to  the  PLS 
scores  is  equivalent  to  using  a  rule  that  assigns  a 
new  observation  to  the  group  whose  mean  is  closest 
in  Euclidean  norm  (11),  that  is,  it  assigns  a  variety 

with  chromatogram  a:  =  (jc, . xp)'  to  the  group 

with  mean  vector  xp  for  which  ||  x  -  xg  ||  is  small¬ 
est  This  procedure,  known  as  Euclidean  distance 
classification  (12,13),  has  an  obvious  generaliza¬ 
tion  to  several  groups 


The  PLS  classification  is  highly  effective  in  this 
example,  and  it  identifies  the  gliadin  peaks  associ¬ 
ated  with  pasta  quality.  Classification  by  PCA  can 
achieve  nearly  the  same  results  but  it  requires  two 
components  and  a  bivariate  linear  discriminant,  so 
it  takes  a  bit  more  effort.  An  alternative  principal 
component  method  is  to  use  SIMCA,  which  takes 
the  grouping  into  account  by  finding  separate 
principal  component  projections  for  the  different 
groups  m  the  training  data  [14]  It  is  not  clear  that 
there  is  much  to  gain  by  using  more  complex 
methods  in  the  present  example  In  other  exam¬ 
ples,  eg.  when  there  is  doubt  that  all  of  the 
observations  fall  in  known  groups,  other  methods 
might  well  yield  superior  results. 


A  REGRESSION  EXAMPLE 

The  second  example  concerns  a  dataset  con¬ 
taining  measurements  on  twelve  varieties  of  hard 
red  spring  wheat.  For  each  variety  we  have  HPLC 


A  PCA(I)  Classification  b  PCA(II)  Classification 


a. 

Classified  -12 

Classified  45 

o 

O 

Classified  42 

Classified  45 

,  ,  .  . . 

OJ 

“  •-•*  • 

V  Ur — i - ' — - - - 1  ,  t — - . — • — < 

■5  0  5  10  15  20  25  30  -15  -10  -5  0  5  10  15 


Score  (thousands)  Score  (thousands) 

c  PLS  Classification 


Classified  42 


Classified  45 


10  15 


Scoro  (thousands) 

Fig  3  Linear  discriminant  classification  of  Durum  wheat  samples  using  (A)  fust  principal  component.  (B)  first  two  pnncipal 
components,  and  (C)  first  partial  least-squares  projection 
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results  of  proteins  extracted  with  80%  ethanol.  In 
addition,  a  number  of  different  kinds  of  measure¬ 
ments  were  made  of  the  physical  properties  of  the 
gram,  its  milling  properties,  and  its  mixing  and 
baking  properties  as  described  by  Nolle  et  al.  [15]. 
We  selected  three  for  detailed  study,  (i)  loaf 
volume,  the  volume  of  a  baked  loaf  of  bread  from 
a  given  amount  of  flour,  (u)  mix  time,  the  amount 
of  mixing  required  for  the  dough  to  achieve  a 
certain  consistency,  and  (m)  percentage  wheat  ash, 
a  measure  of  the  mineral  content  Loaf  volume 
and  mix  time  were  selected  because  they  are  known 
to  be  related  to  the  proteins  of  wheat.  Ash  was 


selected  as  a  negative  control  in  our  data  analysis 
experiment,  since  it  is  a  variable  thought  to  be 
unrelated  to  the  protein  composition. 

Fig  4A  shows  the  chromatograms  for  the  twelve 
varieties.  Each  chromatogram  contains  514  mea¬ 
surements  at  the  rate  of  12  per  minute  starting  at 
5  mm  For  the  purpose  of  relating  protein  content 
to  the  various  attnbutes  an  important  issue  is  the 
variability  at  the  different  sites,  which  can  be  seen 
more  clearly  from  the  mean-centered  chromato¬ 
grams  in  Fig.  4B.  For  instance,  the  raw  chromato¬ 
grams  in  Fig.  4 A  have  a  strong  peak  around  26 
min  that  shows  very  little  variation  across  samples 


A  Raw  Chromatograms 


TtfT*  (nv\i»») 

B  Mean-Centered 


Fig.  4  (A)  Chromatograms  for  twelve  samples  of  wheat  grown  in  Mesa.  AZ.  (B>  mean-centered  chromatograms.  (C)  first  two 
eigenvectors  from  PCA 
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Component 


Fig  5.  Standard  deviations  (solid  line)  and  cumulative  propor* 
lions  of  variance  (dashed  line)  for  principal  components  with 
nonzero  eigenvalues 


and  appears  only  as  a  small  bump  in  Fig  4B. 
Centered  chromatograms  were  computed  as  fol¬ 
lows: 

1.  Compute  the  vector  of  componentwise  means 
x'  ~  (xlt  x2,...,xsu)  where  Xj  is  the  mean  of 
the  twelve  absorbance  measurements  for  the 
_/  th  time  point. 

2  Subtract  the  components  of  x  from  the  corre¬ 
sponding  components  of  each  of  the  twelve 
individual  chromatographs. 


0  10  20  30  <0  SO 

C  PLS  Weights  for  Loaf  Volume 


Tig  6  Weight  vectors  for  centered  chromatograms  of  twelve  wheat  samples.  (A)  PLS  weights  for  ash  <B>  PLS  weights  for  mix  time, 
(C)  PLS  weights  for  loaf  volume 
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In  some  instances  there  may  be  unusual  chro-  We  next  carried  out  the  PLS  computations  to 

matograms  that  have  a  large  effect  on  the  mean  relate  ash.  mix  time  and  loaf  \olumc  to  the  HPLC 
centering.  In  such  cases  it  is  useful  to  plot  profiles.  Although  one  can  treat  linear  combina- 

mcdian-centered  chromatograms  for  comparison.  tions  of  the  three  quality  variables  using  PLS.  we 

The  first  two  components  from  principal  com-  treated  them  one  at  a  time  because  we  wished  to 
ponent  analysis  are  shown  in  Fig.  4C  The  largest  compare  the  predictability  of  these  three  attributes 

source  of  \ariation  is  a  peak  or  pair  of  peaks  using  the  different  methods.  Fig.  6a-c  show  the 

eluting  at  27-28  min.  The  first  component  is  first  PLS  weight  \eciors  for  ash.  mix  time  and  loaf 

essentially  a  difference  across  this  region  of  the  volume.  The  magnitudes  of  the  weights  indicate 

chromatogram.  The  second  component  has  contri-  the  relative  importance  of  the  different  sites  on  the 

butions  from  many  sites,  with  no  apparent  domi-  chromatogram  according  to  the  criterion  used  to 

nant  contributor.  Three  or  four  components  are  select  the  projection. 

required  to  account  for  the  bulk  of  the  variation  in  We  were  initially  surprised  at  Fig.  6A  for  ash. 

the  chromatograms.  Fig.  5  shows  the  standard  which  seemed  to  indicate  that  proteins  eluting  at 

deviations  (solid  line)  and  cumulatixe  proportions  27-28  min  were  important  for  predicting  ash. 

of  total  variance  (dashed  line)  for  the  principal  However,  an  explanation  can  be  found  by  corn- 

components  panson  with  the  first  principal  component  in  Fig. 


A  Ash  versus  PI  S  Score  R  Mix  Time  versus  PLS  Score  f  Loaf  Volume  versus  PLS  Score 


<32101234  4  2  0  2  4  6  ,'-4  -3  2  1  0  1  2 


PI  S  Scot*  AS  Score  (rocsa->e$)  PIS  Score 


D  Ash  versus  PC  Score  E  Mix  Time  versus  PC  Score  F  Loaf  Volume  versus  PC  Score 


-6  -*  20  2  4  6-6-4  -2  0  2  4  6^6-4  20  2  4  6 

PC  Score  (fttuustfs)  PC  Score  (tousvxjjl 

Fig  7  Scalier  plots  for  response  variables  versus  PLS  scores  (A-Q  and  PC  scores  (D-F) 


PC  Score  (t>ousan£s) 
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4C.  which  is  quite  similar.  Recall  that  the  PLS 
factor  is  the  direction  with  the  largest  covariance 
with  ash.  It  is  plausible  that  ash  varies  as  the  total 
protein  content  varies  (more  protein  means  less 
ash)-  Variation  in  total  proton  content  would  in 
turn  be  connected  with  the  variation  in  the  prin¬ 
cipal  component.  It  so  happens  that  peaks  in  the 
indicated  region  show  substantially  greater  varia¬ 
tion  that  the  other  sites,  so  these  show  up  in  both 
the  PLS  factor  for  ash  and  the  principal  compo¬ 
nent.  Contrary  to  the  impression  conveyed  by  Fig. 
6A.  it  is  doubtful  that  the  proteins  eluting  at 
27-2S  min  have  any  causitive  relationship  with 
ash  content.  Instead,  it  is  quite  likely  that  they 
receive  the  highest  weights  simply  because  they 
account  for  the  largest  portion  of  the  variability  in 
the  chromatograms  and.  consequently,  the  varia¬ 
tion  in  total  protein  content. 

Fig.  7A-C  arc  scatter  plots  of  the  three  re¬ 
sponse  variables  ash.  mix  time  and  loaf  volume 
versus  their  respective  PLS  scores.  Fig.  7D-  F  show 
the  same  response  variables  plotted  against  the 
principal  component  scores.  The  least-squares  lines 
for  regression  on  PLS  and  Principal  components 
analysis  scores  are  included  as  well.  All  of  the  PLS 
scatter  plots  suggest  some  positive  relationship, 
however,  there  is  a  hidden  bias  in  these  plots 
because  each  PLS  direction  was  selected  to  have  a 
relationship  with  the  corresponding  response.  One 
manifestation  of  this  bias  is  inflation  of  the  false 
positives  rate  for  the  so-called  F-te$t  for  the  re¬ 
gression  on  the  PLS  scores.  The  F-test  provides  a 
means  for  assessing  the  statistical  significance  of 
the  apparent  regression  relationship  (16].  For  sim¬ 
ple  linear  regression,  including  as  a  special  case 
regression  on  the  first  PCA  component,  the  test 
statistic  has  an  F  distribution  with  1  and  n  -  2 
degrees  of  freedom  under  the  zero-slope  hypothe¬ 
cs.  This  is  making  the  standard  assumption  that 
the  noise  terms  in  the  regression  model  are  inde¬ 
pendent  and  normally  distributed  with  mean  zero 
and  a  common  variance.  For  regression  on  the 
PLS  direction  this  reference  distribution  is  no 
longer  correct,  because  of  the  dependence  of  the 
direction  on  the  response  variable 

To  get  an  approximation  to  the  correct  refer¬ 
ence  distribution  we  generated  5000  random  sam¬ 
ples  of  size  12  from  the  normal  distribution,  using 


Ftg.  S.  Percentile- percenuJc  plot  of  5000  Monte  Carlo-gener¬ 
ated  F  statistics  for  regression  on  PLS  factor  \crsus  (be  F 
distnbcSton. 


each  sample  of  responses  to  gel  the  PLS  direction 
for  the  twelve  observed  chromatograms  m  our 
example.  Uniform  deviates  were  generated  using  a 
multiplicative  congrucntia!  generator  with  mod¬ 
ulus  2JI  —  1  and  multiplier  7*  (17].  Normally  dis¬ 
tributed  deviates  were  obtained  via  the  Box- 
Muller  transformation.  We  assumed  unit  variance 
for  the  responses,  but  this  has  no  bearing  on  the 
results,  because  the  PLS  direction  vector  and  the 
F  statistic  are  invariant  to  scale  multiples  of  the 
response  (11].  For  each  of  the  5000  samples  we 
computed  the  F  statistic  for  the  regression  on  the 
PLS  direction.  The  ordered  values  are  plotted 
against  percentiles  of  the  Fdistnbution  with  1  and 
10  degrees  of  freedom  in  Fig.  8.  If  this  were  the 
correct  reference  distribution  the  points  should 
fall  very  close  to  the  diagonal,  however,  there  is  a 
clear  upward  bias  that  results  from  the  way  the 
PLS  direction  is  selected.  The  figure  allows  us  to 
correct  for  this  bias.  For  instance,  with  our  design 
a  PLS  Fof  10  is  equivalent  to  an  ordinary  Fof  5, 
which  has  significance  level  0.05.  If  instead  we 
were  to  look  up  the  PLS  F  value  in  the  ordinary  F 
table  we  would  erroneously  conclude  that  the  sig¬ 
nificance  level  is  001. 

The  simulated  distribution  of  the  test  statistic 
provide?,  imates  of  the  significance  levels  for  the 
regressions  of  ash.  mix  time  and  loaf  volume  on 
their  respective  PLS  directions  count  the  number 
of  times  the  simulated  values  exceed  the  observed 
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values  for  ihe  data  ai  hand,  and  divide  by  the 
number  of  random  samples  generated.  The  follow¬ 
ing  table  shows  the  correct  significance  levels  and 
the  values  that  result  from  using  the  F distribution 
for  regressing  ash.  mix  time  and  loaf  \oIume  on 
their  PLS  directions.  The  numbers  in  parentheses 
are  estimated  standard  deviations  that  arise  from 
the  Monte  Carlo  sampling  technique. 


Response 

Observed  F 

F-levd 

True  level 

Ash 

3.S 

ooso 

0.3S(±0007) 

Mu  nme 

26  S 

■4I-I0'4 

0002  (±00006) 

Loaf  volume 

12.0 

0.0061 

003  (±0.0024) 

Hoskuldsson  (5J  and  others  have  suggested  to  use 
the  F-iest  for  the  regression  on  the  PLS  compo¬ 
nent  as  an  approximation  Because  of  the  upward 
bias,  comparing  the  PLS  F  statistic  to  the  F 
distribution  is  a  liberal  procedure;  ‘non-signifi¬ 
cance'  according  to  the  F  distribution  implies 
non-significance  according  to  the  correct  distribu¬ 
tion  of  the  PLS  F  statistic,  but  significance 
according  to  the  F  does  not  imply  significance 
according  to  the  correct  distribution.  The  above 
computations  show  that  the  difference  between 
the  F-lcvel  and  the  true  level  can  be  quite  dramatic. 

Unlike  the  ordinary  regression  F  statistic,  the 
PLS  F  statistic  has  a  null  distribution  that  de¬ 
pends  on  the  distribution  of  the  predictor  varia¬ 
bles  Hence,  this  statistic  has  to  be  recalibrated  for 
each  new  regression  design.  Monte  Carlo  simula¬ 
tion  offers  a  means  for  performing  this  calibra¬ 
tion.  The  exact  distribution  for  some  very  special 
designs  has  been  worked  out  in  ref.  11. 

The  fact  that  certain  peaks  are  given  large 
weight  by  PLS  or  principal  components  does  not 
prove  that  they  are  strongly  related  to  the  re¬ 
sponse  of  interest  Some  direction  will  always  be 
selected,  and  in  high  dimensions  it  is  quite  possi¬ 
ble  to  obtain  a  striking  plot  of  the  PLS  weights 
that  is  simply  an  artifact.  From  the  preceding 
calculations  we  conclude  that,  despite  the  impres¬ 
sive  loadings  plot,  there  is  little  evidence  of  a 
relationship  between  ash  and  protein  composition. 
On  the  other  hand  mix  time  appears  to  have  a 
rather  strong  relationship  with  protein  composi¬ 
tion;  however,  further  experimentation  would  be 
needed  to  determine  whether  the  peaks  indicated 
by  PLS  and  principal  component  analysis,  which 


are  virtually  identical  in  this  case,  have  a  causilive 
relationship  or  were  selected  simply  because  they 
show  the  greatest  variation.  Loaf  volume  is  an 
intermediate  case,  showing  a  moderately  signifi¬ 
cantly  relationship  with  protein  composition  The 
corresponding  PLS  direction  differs  somewhat 
from  the  principal  component,  and  we  have  an 
indication  that  proteins  eluting  at  17-19  min  might 
be  important.  Further  experimentation  would  be 
needed  before  we  could  say  anything  conclusive. 
Such  information  is,  however,  of  great  potential 
value,  as  it  gives  a  tentative  indication  of  specific 
proteins  that,  through  subsequent  isolation  and 
characterization,  might  explain  various  attributes 
or  ser\e  as  the  basis  for  sensitive  and  rapid  tests. 


DISCUSSION 

Data  analysis  in  high  dimensions  is  a  tricky 
business.  There  is  considerable  lattitude  for  the 
selection  of  ‘factors’  that  appear  to  demonstrate 
striking  relationships.  In  order  to  separate  the 
artificial  relationships  from  the  real  ones,  great 
care  should  be  taken  to  employ  proper  statistical 
inference  methods  that  account  for  the  multiplic¬ 
ity  of  directions  available.  One  method  that  we 
have  demonstrated  is  the  use  of  simulations  to  get 
the  correct  null  distribution  of  the  F  statistic  for 
regression  on  the  PLS  direction.  This  provides  a 
useful  screening  procedure  for  spurious  directions 

Our  goal  in  the  present  investigation  is  an 
ambitious  one  In  addition  to  classifying  or  pre¬ 
dicting  from  the  chromatogram  we  attempt  to 
interpret  the  weighting  vectors  produced  by  the 
dimension  reduction  This  is  the  most  difficult 
aspect  of  the  analysis  and  the  one  that  is  most 
likely  to  give  spurious  results.  There  is  less  of  a 
problem  if  one  merely  wants  to  predict  or  classify 
without  attempting  to  interpret  the  weighting  vec¬ 
tors.  In  such  instances  the  PLS  dimension  reduc¬ 
tion  is  likely  to  be  a  useful  one,  because  it  chooses 
projections  with  maximal  covariance  with  the  re¬ 
sponse.  Nevertheless,  as  we  have  demonstrated, 
the  standard  regression  tests  and  prediction  inter¬ 
vals  require  adjustment  for  the  variable  selection. 
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Abstract 


Bandeen-Roche,  K  and  Ruppert,  D .  1991  Source  apportionment  with  one  source  unknown  Chemometrics  and  Intelligent  Laboratory 
Systems,  10  169-184 

Attribution  of  local  pollution  to  area  sources  is  essential  to  effective  management  of  the  environment  Source  apportionment 
addresses  the  problem  by  statistical  inference  of  source  contributions  to  total  pollution  from  observations  of  ambient  air  chemical 
composition  Mass  balance  methods  of  source  apportionment  use  linear  models  with  chemical  composition  vectors  of  sources  as 
covanates  Historically,  mass  balance  methods  have  assumed  that  3t  least  a  proxy  of  each  covanate  is  available  and  has  been 
accounted  for 

We  attempt  to  adapt  the  mass  balance  method  to  the  case  in  which  unidentified  sources  may  exist  by  estimating  an  unknown, 
possibly  ’background’,  source  Further,  we  allow  source  contributions  to  pollution  to  vary  o\er  time,  creating  a  model  with  a 
’Structural’  parameter  and  infinitely  many  ‘incidental’  parameters  We  treat  the  ‘incidental’  source  contribution  parameters  as 
random  quantities  Investigating  the  properties  of  the  distribution  governing  relame  source  contributions  is  then  of  interest 
Reasonable  identifiabihty  constraints  are  required  in  this  context  NonparamctOv  estimation  of  the  unknown  source  is  possible  under 
such  constraints  but  is  impractical  for  small  samples  which  are  measured  with  error  Therefore,  we  develop  a  parametric  model  for 
the  distribution  of  the  observations  and  examine  estimates  based  on  this  model 


INTRODUCTION 

One  of  the  important  problems  of  environmen¬ 
tal  engineering  is  to  identify  major  sources  of 
pollution  and  determine  their  relative  effects  upon 
the  surrounding  (‘ambient’)  air,  water,  or  some 
other  medium  Attempts  have  been  made  to  pre¬ 
dict  cumulative  effects  based  on  chemical  mea¬ 
surements  taken  at  individual  source  locations 


However,  factors  sjch  as  meteorology,  topogra¬ 
phy,  and  multiplicity  of  sources  make  predicting 
the  effects  of  sources  at  a  removed  location  dif¬ 
ficult.  An  alternative  approach  is  to  measure  sam¬ 
ples  of  the  ambient  medium.  Source  contributions 
to  pollution  levels  are  then  inferred  using  statisti¬ 
cal  methods.  The  body  of  methods  which  has  been 
developed  to  achieve  such  inference  is  known  as 
source  apportionment. 
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The  chemical  mass  balance  (CMB)  method  of 
source  apportionment,  developed  for  study  of 
atmospheric  pollution,  assumes  a  linear  model  for 
the  chemical  composition  of  ambient  air.  The 
chemical  composition  vectors  of  area  pollution 
sources,  called  source  profiles,  are  used  as  co- 
vanates,  and  mass  contributions  of  sources  to 
pollution  are  considered  to  be  the  parameters  of 
interest  Typically,  source  profiles  are  given  in 
terms  of  mass  fractions  —  for  instance,  milligrams 
of  particulate  matter  with  a  given  chemical  prop¬ 
erty  per  gram  of  particulate  source  output  Source 
contributions,  then,  are  often  parameterized  as 
concentrations  —  particulate  mass  of  a  given 
filtering  specification  contributed  by  each  source 
per  unit  mass  of  that  specification,  or  per  unit 
volume  of  ambient  air.  Linearity  arises  from  the 
assumption  that  mass  is  conserved  from  sources  to 
the  ambient  air  sampler,  so  that  the  composition 
of  the  observed  sample  is  just  a  sum  of  the  param¬ 
eters  multiplied  (in  a  vector  sense)  by  the  corre¬ 
sponding  covariates  Parameter  estimation  has 
usually  been  achieved  using  variations  on  stan¬ 
dard  least-squares  methodology  It  is  important  to 
note  that  the  traditional  CMB  model  treats  each 
ambient  profile  observation,  perhaps  time-aver¬ 
aged,  as  a  distinct  sample  In  this  context,  vector 
elements  provide  repeat  observations,  and  time 
variation  is  not  accounted  for  in  any  explicit  way. 

As  useful  as  CMB  models  have  proven  to  be  in 
practice,  the  methodology  has  significant  short¬ 
comings  Perhaps  chief  among  them  is  the  fact 
that  they  require  both  awareness  of  all  possible 
sources  and  knowledge  of  their  chemical  composi¬ 
tions,  as  is  illustrated  by  an  example  described  by 
Aldershof  and  Ruppert  (1]  Researchers  at  EPA 
were  interested  in  the  relative  contributions  of 
woodstoves  and  vehicular  emissions  to  local  en¬ 
vironments.  A  source  profile  for  woodstove  smoke 
was  carefully  constructed,  but  unfortunately  the 
source  profile  for  vehicular  emissions  was  not 
available  at  the  time.  A  chemical  engineer  in¬ 
volved  m  the  study  suggested  that  the  profile  of 
the  unknown  source  might  be  considered  as  a 
stable  parameter,  and  that  thereby  a  well  posed 
model  for  the  composition  of  area  pollution  might 
be  formulated.  As  m  usual  CMB  models,  the 
parameters  of  interest  are  the  contnbutions  of  all 


sources  to  pollution  However,  their  estimation 
requires  estimation  of  the  unknown  source  profile. 

The  existence  of  problems  such  as  that  which 
we  have  just  described  has  led  us  to  develop 
methodology  which  generalizes  the  traditional 
CMB  model  in  two  ways  Firstly,  it  allows  for  the 
possibility  that  all  sources  have  not  been  de¬ 
termined  by  estimating  an  unknown  source.  We 
will  allow  an  arbitrary  number  of  known  sources 
but  only  one  unknown  source  The  case  of  one 
unknown  source  is  interesting  in  its  own  right,  as 
the  woodstove  example  shows  Moreover,  in  some 
situations  where  there  are  several  unknown 
sources,  investigators  will  be  willing  to  aggregate 
all  unknown  sources  into  a  general  ‘background’ 
unknown  For  example,  this  would  be  sensible  if 
the  relative  contributions  of  the  unknown  sources 
were  stable  over  time  After  this  aggregation  of 
unknown  sources  our  methodology  can  be  ap¬ 
plied,  though  of  course  only  the  distribution  of  the 
aggregate  contribution  from  the  unknown  sources 
will  be  estimated 

A  second,  more  subtle  modification  is  that  our 
models  are  formulated  for  source  profiles  given  in 
a  form  which  is  proportional  with  respect  to  a 
fixed  set  of  chemical  species,  rather  than  in  mass 
fraction  form  In  particular,  we  define  a  profile 
vector  by  taking  the  particulate  mass  per  unit  of 
source  output  due  to  each  member  of  the  fixed  set 
and  dividing  by  the  particulate  mass  per  unit 
attributable  to  the  entire  set  of  species.  This  is  a 
generalization  in  the  sense  that  transforming  mass 
profiles  to  proportional  profiles  is  always  possible, 
whereas  the  information  necessary  to  perform  the 
converse  operation  may  not  be  available  in  some 
applications  Although  this  course  of  action  was 
taken  chiefly  to  accommodate  cases  when  profiles 
arc  only  given  in  proportional  form  —  the 
woodstove  data  set  is  such  a  case  —  we  remark 
that  it  is  often  possible  to  obtain  proportional 
profiles  which  are  much  more  accurate  than  mass 
fraction  profiles  (see  Kowalczyk  et  al  (21)  An 
important  spinoff  of  using  proportional  profiles, 
however,  is  that  the  total  mass  contnbutions  of 
sources  to  pollution  are  no  longer  estimated  di¬ 
rectly  Instead,  source  contributions  of  only  those 
chemical  species  actually  measured  and  used  to 
define  the  profile  —  a  quantity  of  interest  in  its 
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own  right  —  are  estimated  Happily,  one  may 
deduce  total  contributions  from  the  proportional 
profile  parameters  if  source  profiles  are  available 
in  terms  of  amounts 

Studying  the  estimation  of  an  unknown  source 
m  the  context  of  the  CMB  method  has  led  us  to 
consider  two  other  limitations  of  the  CMB  model 
The  first  arises  when  one  attempts  to  estimate  the 
unknown  source  namely  the  inability  of  the  CMB 
method  to  deal  with  the  variations  of  source  con¬ 
tributions  over  time.  To  understand  the  problem, 
consider  the  fact  that  ambient  sample  composition 
is  determined  by  the  compositions  of  known 
sources,  a  constant  unknown  source  parameter, 
and  source  contribution  parameters  that  differ 
with  each  observation  This  creates,  in  the 
terminology  of  Kiefer  and  Wolfowitz  (3),  two 
classes  of  parameters  a  fimte-dimensional  ‘struct¬ 
ural’  parameter  (the  profile  of  the  unknown  source) 
and  an  infinite  sequence  of  ‘incidental’  parameters 
(daily  proportional  source  contributions)  Any 
reasonable  estimator  of  the  structural  parameter 
must  include  observations  corresponding  to  dis¬ 
tinct  incidental  parameters.  However,  it  is  well 
known  that  estimation  is  often  impossible  if  inci¬ 
dental  parameters  are  deterministic  In  order  to 
address  this  difficulty,  we  have  chosen  to  treat 
daily  source  contributions  as  random  quantities 
In  this  context,  the  distribution  of  the  incidental 
parameters  (source  contributions)  rather  than  the 
individual  parameters  is  estimated. 

We  will  address  a  second  limitation  by  explor¬ 
ing  error  structures  which  are  more  natural  to 
nonnegative  vector  observations  than  the  additive, 
Gaussian  error  structure  implicitly  assumed  by 
CMB  models 

Henceforth,  random-proportion,  unknown- 
source  CMB  models  will  be  leferred  to  as  source 
apportionment,  one  source  unknown  (SASU) 
models,  and  we  will  consider  source  contributions 
to  be  those  resulting  from  a  proportional  profiie 
formulation  unless  otherwise  specified.  We  will 
develop  our  model,  which  is  no  longer  linear,  and 
examine  its  relationship  to  the  traditional  CMB 
model  m  the  next  section.  To  make  the  exposition 
simpler,  in  this  paper  only  the  case  of  a  single 
known  source  will  be  treated  explicitly.  In  ad¬ 
dition  to  the  nonlinearity  of  the  model,  the  onc- 


source-unknown  case  differs  fundamentally  from 
the  case  in  which  all  source  profiles  are  known  in 
that  its  parameters  are  not  identifiable  without  the 
addition  of  constraints  It  will  be  helpful  to  ex¬ 
amine  the  case  m  which  observations  are  made 
without  measurement  error  —  in  other  words, 
day-to-day  differences  in  source  contributions 
provide  the  only  random  variation  In  this  case,  a 
simple  constraint  allows  consistent  estimation  of 
the  parameters  of  interest,  and  asymptotic  distri¬ 
butions  for  the  estimates  are  available  Measure¬ 
ment  error  complicates  estimation  considerably  — 
so  much  so  that  nonparametnc  estimation  be¬ 
comes  extremely  and  perhaps  prohibitively  dif¬ 
ficult  in  a  small  sample  context.  Consequently,  we 
will  propose  an  appropriately  constrained  para¬ 
metric  model  and  study  its  behavior 

Source  apportionment  and  CMB  models  have 
been  discussed  by  many  authors,  including  Coo¬ 
per  and  Watson  [4],  Gordon  [5],  and  Henry  et  al 
[6]  Introduction  of  unknown  source  estimation 
into  CMB  methodology  was  done  following  ref  1 
Estimation  of  structural  parameters  in  the  pres¬ 
ence  of  incidental  parameters  was  first  discussed 
by  Neyman  and  Scott  [7]  and  has  since  been  a 
topic  of  continuing  interest  Relevant  papers  in¬ 
clude  refs  3  and  8-10  Campbell  and  Mosimann 
[11]  provide  insight  into  parametric  models  for 
proportional  data. 


SETUP  OF  THE  PROBLEM 

Observations  in  CMB  models  are  generally  the 
total  amounts  of  various  chemical  species  col¬ 
lected  during  ambient  air  sampling,  perhaps  given 
as  concentrations  When  source  profiles  are  pro¬ 
portional,  an  equivalent,  geometrically  intuitive 
formulation  of  the  problem  results  by  standardiz¬ 
ing  observations  to  proportions  as  well  Both  for¬ 
mulations  prove  to  be  useful  in  what  follows 

The  SASU  model 

Although  focus  soon  shifts  to  the  case  in  which 
only  one  source  is  known,  we  will  state  the  model 
for  the  general  case  of  m  known  sources.  Let 
xm  be  p-dimensional,  deterministic  co- 
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variates,  let  8  be  an  unknown,  /j-dimensional 
parameter.  In  the  SASU  model,  xk,  k  =  1,...,  m, 
are  profiles  of  known  sources,  8  is  the  profile  of 
the  unknown  sojrce,  and  each  has  been  standar¬ 
dized  to  proportions  We  impose  the  resulting 
constraints 

E*,-*i  (*“!•  ■••.«>)  (i) 

,  =  1  1 

x.  >  0 VA,  Q  >  0 

In  addition,  let  at  be  independent 

and  identically  distributed  (iid),  w-dimensional 
random  vectors  whose  components  are  nonnega¬ 
tive  and  sum  to  at  most  1.  The  vectors  a,  repre¬ 
sent  the  daily  contributions  of  the  known  sources, 
so  that  the  scalars  (1  -  Lkatk)  correspond  to  pro¬ 
portional  contributions  of  the  unknown  source 
Let  G  be  the  joint  distribution  function  for  the 
components  of  a,,  that  is, 

G(c)  =  P{a,  <  c} 


where  the  inequality  holds  for  each  component 
Analogously,  let  yf  be  nd,  m-dimensional  random 
vectors  with  nonnegative  components  and  yf  a 
nonnegative,  real-valued  random  variable  (i- 
1,  ,«)  yf  is  the  corresponding  vector  to  a,, 

given  in  terms  of  amounts,  so  that  yf  represents 
the  mass  contribution  of  the  unknown  source  to 
the  set  of  chemical  species  defining  the  profiles. 
We  will  denote  the  joint  distribution  function  of 
the  components  of  yf  and  yf  to  be  F,  that  is, 

F(c-)  =P{y'', 


where  V'  stands  for  the  transpose  of  the  vector  V 
Again,  the  inequality  holds  for  each  component. 
Random  variables  of  interest  arc 

a*  £  -  E  (2) 

k~\  \  k- i  / 

and 


•  E  +  y?<> 

k- 1 


so  that 

>\ 


and  a,k  = 

hsn  Ly.k 

i  ‘ 


Components  of  the  vectors  yt  and  s,  represent 
true  ambient  air  chemical  proportions  and 
amounts,  respectively,  on  day  i.  We  observe  Y, 
and  which  are  measured  values  of  y,  and  sr  In 
the  next  section  we  examine  the  simple  case  where 
Yy  =  j;  and  S,  =  s,  We  develop  below  a  parametric 
model  for  the  measurement  errors.  Notice  that  in 
the  case  of  present  interest,  m  =  1,  y\  *  atx  +  (1 
—  a,) 6  and  st  ~  yfx  +  y fO. 


Transformation  to  CMB  model 


The  traditional  CMB  model  is  as  follow ;> 


(3) 


where  c,k  =  total  particulate  mass  contributed  by 
source  A  per  unit  volume  of  ambient  air  on  day  i, 
ak  -  mass  profile  of  source  k 

The  subtle  difference  between  the  CMB  param¬ 
eters,  c,k ,  and  the  SASU  parameters,  yfk  —  equiv¬ 
alently,  alk  —  occurs  because  information  regard¬ 
ing  the  relative  amounts  of  source  outputs  not 
accounted  for  by  the  set  of  measured  chemical 
species  is  lost  in  the  transformation  from  mass 
profiles  to  proportional  profiles  In  this  section  we 
show  how  the  parameters  in  our  formulation  as 
given  in  the  SASU  model  arc  related  to  the 
parameters  in  eq.  (3) 

Suppose  one  profile  —  say,  am^.l  —  is  un¬ 
known.  Let  =  LjSly.  Clearly, 


8  = 


_  am  +  \ 

E«^.i 

J 


and  y„ 


fa 

{, 


Therefore,  eq  (2)  is  equivalent  to 


s, 

l 


E 


“i  + 


1  -  E 

k  -  1  I 

Ham+\j 

J 


<*m+  1 


which  implies  that  clk  “  $,a,k/Zj(*k)  and  clkak  - 
(tatkxA,  k  =  1,.,  ,  m  (similarly  for  k  =»  m  +  1,  8). 

Knowledge  of  £„  8,  and  a,,  then,  are  sufficient 
to  determine  clkak,  k  “  I,.  m  +  I  (Physically, 
ctkakJ  represents  the  amount  of  the  yth  chemical 
species  contributed  by  source  k  to  the  ambient  air 
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sample  on  day  i.)  However,  we  *iii  be  estimating 
only  the  distnbution  of  the  a,  values  because  it  is 
impossible  to  consistently  estimate  the  a ,  values 
themselves  in  the  presence  of  measurement  error 
(see  below)  It  is  possible  to  get  awund  this  incon¬ 
venience  in  the  following  way: 

Lei  ‘k,  =  c,Laij  Then 

qkl  __  Qu/ E.^u  _  au/Hnakn  xu 

ft/  ft/E.ft.  «i/E„tft.  **/ 

Letting  =0,  we  have  a  system  of  indepen¬ 
dent  linear  equations  in  p(m  +  1)  unknowns 
Eq  (3)  implies 

E  <Uj  -i,y  =  0,  J-l..  ;P  (S) 

<1-1 

Eq  (4)  implies 

=  0'  L  =  l.  ,m+l, 

J  =  2.  .  p  (6) 

There  are  a  total  of  p  +  (m  +  IX -  l)  =  p(m  + 
1)  +  {p  -  m  -  1)  equations  Hence,  we  may  solve 
for  qk%  k  »  1,  . ,  m  +  1  if  p  >  m  +  I  —  that  is,  if 
there  are  more  species  than  known  sources 
If,  m  addition,  the  mass  source  profiles,  aA, 
k  =  1.  ,  m,  are  known,  solution  for  the  corre¬ 

sponding  source  contributions,  ctk,  follow  im¬ 
mediately.  Notice  that  these  solutions  are  cor¬ 
rected  for  the  contribution  of  the  unknown  source 


NONPARAMETRIC  MODELS 
No  measurement  error 

For  now,  we  will  use  the  formulation  of  the 
SASU  model  for  which  observations  arc  propor¬ 
tional.  In  the  case  where  observations  occur 
without  measurement  error, 

Y l~yl~«t{x-8)  +  6  (7) 

In  this  section  we  take  a  nonparametric  ap¬ 
proach  in  that  the  distnbution  G  of  a,  is  not 
assumed  to  be  in  a  parametric  family.  Without  the 
natural  constraints  mentioned  above  and  an  ad¬ 
ditional  restriction  on  (7,  the  model  (eq  (7))  is  not 


identifiable.  To  understand  why,  consider  a  simple 
transformation: 

Let  Yt  —  x  =  Z(,  9  —  x  =  <j>,  (1  —  a,)  =  A(  It  is 
clear  that  eq  (7)  is  equivalent  to 

Z,  =  A,$  (8) 

Let  A(  =  A,/2  and  =  2<£>  V,  =  X,<£  has  exactly 
the  same  distribution  as  Z,  This  means  that  the 
parameters  of  Z,  are  not  identifiable  from  its 
distribution.  In  fact,  the  model  (7)  implies  that 
Confy  ,>»(/]  =  IV;,/  and,  hence,  that  the  p- di¬ 
mensional  system  effectively  reduces  to  one  di¬ 
mension 

Realizing  that  nomdentifiability  occurs  because 
our  model  allows  too  much  scaling  suggests  an 
appropriate  constraint  confining  allowable  distri¬ 
butions  for  «,  to  those  whose  left  boundary  of 
support  is  exactly  0  —  that  is,  G(a)>  0  for  each 
a  >  0  It  follows  that 

hm  nnn  «,  =  0  with  probability  1 

n  —  oc  1 

or  (9) 

hm  min  A,  »  1  with  probability  1 

n—oo  !<»</> 

which  means  that  if  enough  samples  are  taken, 
eventually  one  will  be  composed  almost  entirely  of 
chemicals  contributed  by  the  unknown  source 
Our  motivating  example,  described  in  the  intro¬ 
duction,  provides  some  insight  on  summer  days, 
one  would  not  expect  people  to  be  using  wood- 
s’oves,  and  condition  (9)  appears  reasonable 

Several  results  follow  under  condition  (9).  When 
there  are  no  measurement  errors,  the  observations 
Y,  he  on  the  line  segment  in  /^-dimensional  space 
connecting  the  known  x  and  the  unknown  $  This 
suggests  a  simple  estimator  of  9  —  the  observa¬ 
tion  farthest  from  x  In  fact,  it  is  not  difficult  to 
prove  the  following. 

Proposition  1  Define  Y„*  to  be  the  observation 
Y„,  such  that  ma\u,<<T||Y,  -  x||  2  0  ||YW  -  x\ 
Then  Y*  is  a  consistent  estimator  of  9  if  and  only 
if  condition  (9)  holds. 

Once  we  estimate  $,  we  can  then  estimate  the 
contributions,  of  the  known  source  The  basic 
idea  is  that  a t  is  the  distance  between  Y,  and  9 
expressed  as  a  fraction  of  the  distance  between  x 
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and  9  Note  thi  t  Yn*  is  a  monotone  sequence  and, 
thus,  9jn  equ^i  to  the  j  th  component  of  Y*  satis¬ 
fies  the  conditions  of  the  following  result. 

Proposition  2  Let  9}  be  such  that  j  x}  -  0  j  *  0. 
Let  9jn  be  any  monotone  sequence  whose  limit  is 
6r  Define 

where  I  is  the  indicator  function  and  c  d  f.  stands 
for  empirical  distnbution  function  Define 

i  in<j 

Under  (7), 

Def.ne  «,„=<  Y„  -  S„)/(x,  -  $.)  Then  G„ 
is  the  empirical  distribution  function  of  {alw,.  , 
«„„}  Because  there  are  no  measurement  errors, 
(«,„  -  a,)  -*  0  with  probability  1.  As  it  is  a  matter 
of  algebra  to  show  that  maxi  <  /  <  n  |  a,„  -  a,  |  < 

1 ~  \/\xj  “  I-  the  stronger  statement  maxi 

<  /  < ;;  |  -  at  |  — — >0  also  holds. 

We  have  seen  that  0  can  be  estimated  because 
when  a,  is  close  to  0,  then  Y,  consists  mostly  of 
the  contribution  from  the  unknown  source.  The 
rate  at  which  Y*  converges  to  9  depends  on  how 
fast  mini  i  ^  na,  approaches  0,  which  m  turn 
depends  on  the  behavior  of  G  near  0  In  fact, 
extreme  value  theory  provides  an  asymptotic  dis¬ 
tribution  for  Y*' 

Proposition  2  Suppose  that  support  {a}  =  (O.cj, 
c  <  1. 

Suppose  also  that  (?(«)  =  Kap  on  0  <  a  <  K~o/0). 
Then 

lf><° 

{-i^n  it»° 

The  key  fact  in  the  proof  of  Proposition  3  is  as 
follows:  given  independent  observations  from  (7, 
extreme  value  theory  dictates  that 

p((nK)l/fi  min  «,<<?)-*//(<*) 


where  -*  denotes  weak  convergence  and  H  is  the 
distribution  such  that  H(a )  =  1  —  exp{—  a&}  for 
nonnegative  values  of  a  and  0  otherwise  Lead- 
better  et  al.  [12J  provide  an  excellent  reference  to 
extreme  value  theory. 

With  measurement  error 

Given  the  results  attainable  in  the  case  of  no 
measurement  error,  one  might  hope  that  under 
condition  (9)  similar  results  might  hold  in  the  case 
with  measurement  error.  Unfortunately,  this  does 
not  appear  to  be  the  case,  of  course  condition  (9) 
is  still  necessary,  but  introducing  measurement 
error  makes  the  problem  ‘much’  harder  For  one 
thing,  it  makes  consistent  estimation  of  the  indi¬ 
vidual  a,  values  impossible.  In  the  case  of  no 
measurement  error,  it  is  possible  to  write  a,  as  a 
function  of  the  observations  and  the  structural 
parameter.  Since  every  observation  contributes  to 
the  estimation  of  9,  every  observation  contributes 
to  the  estimation  of  a,  through  the  function  (recall 
the  estimator  a,„  discussed  following  Proposition 
2)  In  the  presence  of  measurement  error,  it  is  no 
longer  possible  to  write  «,  as  a  function  of  the 
observations  and  the  structural  parameter  (we  no 
longer  see  the  true  value  of  the  observation)  Hence 
in  effect  only  finitely  many  observations  (1  vector 
observation  or  p  scalar  observations)  contribute  to 
the  estimation  of  so  consistent  estimation  is 
impossible. 

Estimating  the  structural  parameter  is  much 
harder,  as  well  It  is  still  helpful  to  think  of  the 
problem  in  terms  of  estimating  the  endpoint  of  the 
line  segment  between  the  known  x  and  the  un¬ 
known  9.  When  observations  are  made  with  mea¬ 
surement  error,  however,  they  appear  as  a  ‘cloud’ 
of  points  about  the  line  segment  rather  than  being 
confined  to  the  segment  itself.  The  estimator  Y* 
defined  in  the  previous  section,  then,  will  eventu¬ 
ally  overshoot  $  if  the  cloud  extends  far  enough: 
formally,  Y*  converges  to  a  support  boundary  of 
the  distribution  of  Y,  rather  than  to  9 .  the  support 
boundary  of  the  distribution  of  If  one  were  to 
assume  additive  errors  for  the  observations  given 
in  terms  of  amounts,  Si  (or  more  appropriately  for 
some  transformation  of  S,),  the  method  of  decon- 
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volution  may  be  used,  in  effect,  to  account  for  the 
measurement  error  Such  an  approach  is  capable 
of  estimating  6  consistently.  Unfortunately,  con¬ 
vergence  rates  of  nonparametnc  deconvolution 
es'imators  are  inherently  very  slow.  Carroll  and 
Hall  £13]  have  shown  for  a  large  class  of  distribu¬ 
tes  that  in  the  case  of  normal  error,  no  decon¬ 
volution  estimator  can  achieve  a  rate  higher  than  a 
factor  of  (log  w)_1.  It  is  possible,  however,  that  a 
higher  rate  may  obtain  for  distributions  confined 
to  a  bounded  support.  Also,  it  is  known  that 
certain  functionals  of  deconvolution  estimators 
converge  significantly  faster  than  the  estimators 
themselves,  and  it  is  not  unreasonable  to  expect 
that  an  estimator  of  8  could  be  one  of  them 
Further  research  is  necessary  to  investigate  these 
possibilities 


A  PARAMETRIC  MODEL 

In  order  to  produce  estimators  which  achieve 
reasonable  rates  of  convergence  for  moderate  sam¬ 
ple  sizes,  it  appears  that  parametric  models  are 
required  for  both  G  and  the  measurement  error. 
The  discussion  in  the  previous  section  indicates 
that  any  reasonable  model  for  time  variation  must 
satisfy  condition  (9).  In  keeping  with  the  spirit  of 
maximum  generality,  we  will  model  for  propor¬ 
tional  observations,  which  suggests  that  we  utilize 
distributions  inherently  appropriate  for  propor¬ 
tional  data. 

With  these  considerations  in  mind,  we  have 
chosen  to  model  both  time  variation  and  measure¬ 
ment  error  with  the  Dirichlet  distribution.  A  gen¬ 
eralization  of  the  Beta  distribution,  the  Dirichlet 
dist. ’button  is  especially  well  suited  to  modeling 
proportional  vectors  created  by  dividing  amounts 
observations  by  their  sum.  Such  vectors  arc  ex¬ 
actly  Dirichlet-distnbutcd  whenever  amounts  are 
independent  of  each  other  and  the  proportions 
which  result  from  dividing  the  amounts  by  their 
sum  arc  independent  of  the  sum,  whenever 
amounts  are  independent  gamma  random  varia¬ 
bles  with  common  scale,  and  in  certain  cases  when 
amounts  are  positively  correlated,  the  vector  of 
amounts  divided  by  sum  is  Dirichlet.  In  addition, 
Dirichlet  random  variables  satisfy  some  very  con¬ 


venient  properties  Let  Y  be  a  ^-dimensional  Di- 
nchlet  random  vector  with  p-dimensional  parame¬ 
ter  vector,  8  (see  below).  Any  permutation  of  Y  is 
Dirichlet  with  parameter  equal  to  the  correspond¬ 
ing  permutation  of  8  Also,  suppose  Z  is  an 
amalgamation  over  some  partition,  A  =  {a„ 
aq  },  of  the  coordinates  of  Y  —  in  other  words, 

Hi :  y> . e  v,} 

'/6J1  )*aq  ' 


with  q<p.  Then  Z  is  Dirichlet  with  parameter 
equal  to  the  corresponding  amalgamation  of  8. 
These  properties  will  allow  us  to  combine  and 
permute  coordinates  of  observations  in  order  to 
improve  estimates  without  changing  the  underly¬ 
ing  model  for  estimation,  see  below.  Campbell 
and  Mosimann  (11)  provide  a  basic  summary  of 
these  and  other  properties  of  the  Dirichlet  distri¬ 
bution 

In  general,  the  Dirichlet  density  has  the  form 

nr(i,)'“ 

y-i 


p 

where  A  =  £ 

y-» 

It  follows  that  the  first  two  moments  of  a 
Dinchlct  random  variable,  Y,  with  distribution 
/o(  ■>’!$)  are: 


■IV 


£M- 

V3r|yi_M*-8/)  gO-g) 

I'1  (A  +  1)A:  (A  +  l) 

In  general,  the  kih  moment  of  Yf  is 


n  (*,+»«) 

- 

n  (a+»i) 


(10) 

(11) 


(12) 


Notice  also  that  the  coefficient  of  variation  of 
Yis 


cvM-/;^r Tj  <13> 
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We  will  model  the  measurement  error  process 
by  assuming  that  the  conditional  distribution  of  Y 
given  a  has  a  Dirichlet  distribution  with  mean 
ax  +  (1  -  «)0  and  scale  independent  of  a.  There¬ 
fore,  assume  that  Y,  is  Dirichlet  with  parameter 
S4J  =  A(«y(xy - Oj) 4- for  some  constant  A>0 
(note  that  EJY,  | =  a,x  +  (1  -  a,)6  and  that 
£  $  -  A  for  each  i)  Now,  the  marginal  distribu¬ 
tion  of  Y,  is  obtained  by  integrating  f0  over  the 
distribution  of  a,  which  we  hypothesize  to  be  the 
Beta(X|,\2)  *  Dirichlet(X„X2)  distribution.  In 
other  words,  the  density  of  the  marginal  distribu¬ 
tion  of  y  has  the  form: 


/(■>■,)  =  jf'/oOv  I  a) 


r(x,)r(x2) 


E[V]  =  E\E[U  |  V}]%  we  may  write  for  each  coor¬ 
dinate  j,  ys*  1,...,  p: 

mU=£lZ./l  “£(£[ZvlU-«,)H 


=  ^£[l-a,] 
"  A 


(15) 


Similarly. 

'"2,  =  E[Zn\ 

=  {A^£[(l-a,)2]+«,(l-2xy)£[l-a,] 
+*,(l-x,)}{A+l}-‘ 


Xax'-1(l-a)X!“'«la  (14) 

where /0(  y  |  a)  “/0(y,|A(a(r:-fl)  +  fl))  Eq  (14) 
corresponds  to  taking  an  average  of  the  densities 
fo  at  }\  S,ven  eac^  possible  value  of  a,  weighted 
by  the  probability  of  a 

In  the  development  which  follows,  we  will  be 
using  the  quantity  1  -  «  rather  than  a  From  the 
permutation  property  mentioned  above,  it  is  clear 
that  (1  -  a)  has  a  Bcta(X2,X,)  distribution.  Let¬ 
ting  X  =»X2  and  A-“X,  +  X2,  we  may  i  -para¬ 
meterize  the  beta  parameters  from  {X,,X2}  to 
{X,A}.  Certain  functions  of  the  source  contribu¬ 
tion  parameters  are  of  at  least  as  much  interest  as 
the  parameters  themselves:  for  example,  X/A  = 
E(l-rt)  and  (X(A  -  X))/(A2(A  +  1))  =  Varfl - 
a)  «  Varja).  From  now  on  we  will  refer  to  A  as  the 
error  parameter  and  {X.A}  as  the  source  contri¬ 
bution  parameters. 

Given  p  >  3,  all  of  the  parameters  of  this  model 
are  identifiable  from  its  moments  (of  order  3  and 
less).  In  other  words,  these  moments  completely 
determine  A,  X,  A,  and  0.  Therefore,  method  of 
moments  estimators  for  the  parameters  are  con¬ 
sistent,  The  moments  equations  may  be  developed 
as  follows.  Let  Y,  be  an  observation  from  cq.  (14), 
where  is  as  defined  above.  Recall  that  Z,  =  Y, 
-  x  and  $=>x  -  0.  Using  eqs.  (10),  (12),  and  the 
fact  that,  for  any  random  variables  U  and  V, 


1  /A^A(A+1)  *,(l-2*,)A 

A  +  l\  A(A  +  1)  +  A 

+  *,(’”*,)]  (16> 

">>,  -  e{z^\ 

1  (AVjX(X+l)(X  +  2) 

“  (A+l)(A  +  2)\  A(A  +  1)(A  +  2) 

3A(1  -2x,)<>‘A(A+  1) 

+  A(  A  +  1) 

[3v;(l-xy)(A-2)  +  2]^,X 
+  - 

+2xJ(2xJ-3x)-i  l)j  (17) 

Method  of  moments  estimators  are  formed  by 
substituting  the  sample  moments. 

£(£-*.)*  (18> 

for  mqf  m  eqs.  (15)-(17)  and  solving  for  the 
parameters  of  interest.  As  the  moments  equations 
overdetermine  the  parameters,  however,  moments 
estimators  are  not  unique.  In  the  following  sec¬ 
tions  we  will  develop  several  different  estimators, 
examine  their  performance  under  the  model  in  a 
simulation  study,  and  test  them  on  a  famous 
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(a):  A  =  10 


°00  01  02  03  0«  05  06  07 


Coord  4 


Cb3:  A  =  100 


Fig  1  Dinchlet  mixture  data  Observations  arc  generated  from 
model  (14)  with  parameters  \  -  (2,  2)'  and  $  -  {0,  005,  0 1, 
0  2,  04,  0  25),  x-  (0  2. 0  2.  0  2.  0  2. 0  2.0)  Contrast  of  case 
(a).  A  =  10.  with  case  (b).  A  *  100.  illustrates  role  of  A  parame¬ 
ter  5th  data  component  is  plotted  against  4th 


simulated  source  apportionment  data  set.  Compu¬ 
tation  of  estimates  and  error  measures,  as  well  as 
generation  of  ‘random*  observations,  were  per¬ 
formed  on  an  AST  PC  with  a  286  processor  using 
the  GAUSS  system,  version  2  0. 

Description  of  the  simulation 


formed  for  each  of  three  pairs  of  source  contribu¬ 
tion  parameters,  {A, A},  (i)  {4,5},  (n)  {2,4},  and 
(iii)  {1,5}.  Recalling  that  £{1  -  aj  1  -  E[a,)  ■* 
A/A  (see  eq  10)  and  noting  that 

£[v,M£[«J)*  +  (£[i -«,])* 

it  is  clear  that  (i)  represents  a  very  favorable 
estimation  scenario  —  one  in  which  observations 
tend  to  be  close  to  the  unknown  source  profile,  6 
Similarly,  (n)  and  (in)  represent  increasingly  less 
favorable  scenarios. 

In  addition,  each  simulation  described  above 
was  performed  at  two  values  of  A*  A  =  10  and 
A  »  100.  Fig  la  and  b  display  plots  of  data  simu¬ 
lated  under  (n)  for  the  two  values  of  A.  Examina¬ 
tion  of  the  plots  and  review  of  eq.  (13)  makes  it 
clear  that  A  «*  100  represents  middling  measure¬ 
ment  error  while  A  “  10  produces  very  severe  er¬ 
ror. 

As  a  measure  of  performance,  median  and  worst 
90th-percentile  distance  of  each  estimate  from  its 
true  value  arc  given 

Estimation  of  error  scale,  A 

Although  A,  the  error  scale  parameter,  is  in 
effect  a  nuisance  parameter,  its  estimation  is  im¬ 
portant  because  estimates  of  source  contribution 
parameters,  A  and  A,  and  the  location  parameter, 
6,  depend  directly  upon  A.  Also,  since  severity  of 
measurement  error  vanes  inversely  with  A,  that 
parameter  is  itself  of  measure  of  how  well  we  may 
expect  to  estimate  the  parameters  of  interest. 

It  happens  that  each  pairwise  combination  of 
observation  coordinates  —  say,  (j,k)  —  produces 
an  estimate  A/t  of  A.  Define  for  each  coordinate  j, 

'</“  (1-  2x,)m„  +  J(/0-x,)  (19) 

0/- (A +  !)»!,, -.4,  (20) 


Simulations  each  consisted  of  100  runs  at  100 
observations  per  run  generated  from  the  model 
(14).  The  sample  size  of  100  was  chosen  to  be 
comparable  to  that  of  a  typical  source  apportion¬ 
ment  data  set.  Observations  were  six-dimensional 
with  x  -  (0.2,  0.2,  0.2,  0.2,  0.2,  0)'  and  0  -  (0, 
0.05,  0,1,  0.2,  0.4,  0.25)'.  A  simulation  was  per¬ 


C/-(A+l)(A  +  2)mv-{3(l-  2 x,)B, 

+mi/[3(*/(l -a,'(A-2)  +  2| 

+  2x/(2x,’-3*/+l)}  (21) 

where  mXf>  m2j,  and  niy  are  as  defined  in  eqs. 
(15)-(17).  It  is  straightforward,  if  algebraically 
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painful,  to  verify  that  for  any  pair  of  coordinates 

<M), 


mUn'\ ,  ~ 


(22) 


The  estimator  AyX  results  by  substituting  the 
sample  moments,  Mq,  (sec  eq  (18)),  into  eq  (22). 
However,  the  first-order  bias  and  vanance  of  Kjk 
increase  with  \0j-x,\  and  From  a 

heuristic  viewpoint,  one  would  expect  the  best 
estimates  to  result  from  coordinates  for  which 
6j  =  Xj  —  in  other  words,  for  which  the  only 
variation  is  due  to  measurement  error  Since  E[Y(J 
-  *,)  =  x})  (c  constant  over  i  and  y),  the 

M\j  should  contain  information  about  the  relative 
sizes  of  the  quantities  \0;  -  xy|  With  these  things 
in  mind,  we  examined  four  estimates  of  A,  each  a 
weighted  average  of  the  pairwise  estimates  with 
weights  wJk  on  pair  (y,£)  as  follows: 


wjk  all  equal  —  i.e.,  unweighted 
average  of  pairwise  estimates 

1 

|A/„ma/u| 

*'*“  l-'V'ul 

wJk  •  1  for  pair  j>k  such  that  |  MXJ  |, 
|  MXk  |  are  minimum  (in 
other  words,  such  that  one 
coordinate  of  the  pair  has 
the  smallest  value  of  j  Mtt  j, 
/» l,...,p,  and  the  other 
has  the  second  smallest) 
wjk  =>  0  otherwise 
A  summary  of  the  simulation  results  is  given  in 
Table  1.  DMWEST  and  DBEST  clearly  outper¬ 
form  DEST  and  DA  WEST.  It  is  harder  to  dis¬ 
tinguish  between  DMWEST  and  DBEST;  al¬ 
though  DBEST  generally  outperforms  DMWEST 
slightly  in  terms  of  standard  deviation  and  90% 
distance,  DMWEST  tends  to  have  a  smaller  50% 
deviation  from  the  true  parameter  value.  In  both 
cases,  reasonable  estimates  seem  to  be  produced 
regardless  of  model  parameterization. 


DEST 

DAWEST. 

DMWEST: 

DBEST- 


Estimation  of  source  contribution  parameters 

In  this  section  we  develop  estimators  first  of 
(A,  A }  and  then  of  0 .  Given  A,  each  coordinate 


of  the  observations  provides  information  sufficient 
to  estimate  (X,  A  ).  In  particular,  it  happens  that 
for  any  coordinate  y , 


..  Ani?,A(X-*-l) 

'  X(A  +  1) 


(23) 


Am„#,(X  +  2)A 
X(A  +  2) 


(24) 


Clearly  we  could  substitute  the  sample  esti¬ 
mates  (18)  in  (23),  (24),  and  the  definition  of  Bj 
and  Cj ,  and  then  solve  the  above  system  of  equa¬ 
tions  for  X  and  A,  for  any  y.  Reasoning  that  some 
coordinates  may  produce  more  reliable  estimates 
than  others,  however,  we  may  write 


p 


E  W,B,  = 

J- 1 


AX(X  +  1) 
A(A  +  1) 


E  »)'»?, 


1- 1 


(25) 


AA(X  +  2) 
X(  A  +  1) 


E  "ytf/'ll/ 


(26) 


for  any  system  of  weights,  m\  We  will  create 
estimates  based  on  the  solution  of  eqs  (25)  and 
(26)  for  X  and  A,  using  several  choices  of  weights 
and  sample-based  substitutions. 

in  order  to  identify  coordinates  which  should 
produce  more  reliable  information  than  others, 
note  that  since  Vaifl^,  -  x;)  =■  (Var[n,))(0, -  x,)J 
+  ((hi,,  +  x,)(l  -  »ily  -  *,))/(  A  +  1), 


(CVl^-Ay))2- 


Var[a,} 

(£(!-«, ])! 


|  ('»!, +  *,)U  —  X,) 

ml  /  (  A  +  1 ) 

(27) 


As  the  first  term  is  constant  and  exactly  what 
would  result  if  there  were  no  measurement  error, 
the  coordinate-wise  CVs  measure  how  much  varia¬ 
tion  is  due  to  measurement  error  relative  to  each 
other.  Theoretically,  the  most  reliable  information 
should  be  obtained  from  the  coordinates  having 
the  highest  proportion  of  its  variation  due  to  source 
contribution  randomness  —  in  other  words,  the 
coordinates  with  the  lowest  CVs.  One  approach 
might  be  to  calculate  source  contribution  esti- 
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TABLE  1 

Estimation  of  measurement  error  parameter 
Median  and  90%  absolute  deviation  of  estimated  error  parame¬ 
ter  from  A 


Parameter  Median  distance  from  A  —  10 
value 

X 

A 

DEST 

DAWEST 

DMWEST 

DBEST 

4 

5 

0.902 

0852 

0.S02 

0.859 

2 

4 

171 

1.74 

121 

1.4$ 

t 

5 

?_6$ 

2.14 

1.93 

199 

90%  distance  from  A  — 

10 

DEST 

DAWEST 

DMWEST 

DBEST 

4 

5 

3.78 

319 

1.9S 

2.30 

2 

4 

104 

7  SO 

600 

2S9 

1 

5 

114 

12.7 

101 

700 

Median  distance  from  A  - 100 

DEST 

DAWEST 

DMWEST 

DBEST 

4 

5 

12.1 

10.6 

8  53 

865 

2 

4 

200 

136 

82S 

962 

1 

5 

14  3 

12  9 

940 

107 

90%  distance  from  A  - 

100 

DFST 

DAWEST 

DMWEST 

DBEST 

4 

5 

43  7 

37  4 

21  1 

207 

2 

4 

917 

62  8 

262 

1 

5 

69  5 

571 

26  3 

24  7 

mates  base*,  inly  on  the  coordinate  with  the  lowest 
sample  CV.  However,  examination  of  eq.  (27) 
suggests  an  approach  which  includes  all  of  the 
data  The  second  term  of  the  sum  tends  to  de¬ 
crease  as  \mXf  \  increases  —  in  other  words,  as 
the  distance  between  xf  and  6f  increases.  As  the 
dimension  of  the  observation  increases,  the  |  m,  | 
values  will  tend  to  decrease.  However,  amalgamat¬ 
ing  the  observations  to  a  few  favorable  dimensions 
can  provide  several  coordinates  with  large  values 
of  |w,y|  while  retaining  a  correct  parametric 
form  We  chose  to  amalgamate  to  thr  e  coordi¬ 
nates,  the  smallest  number  which  allows  identifi¬ 
cation  of  the  entire  parameter  space,  and  chose 
the  particular  amalgamation  for  which  the  coordi¬ 
nate  with  the  lowest  sample  CV  is  retained  and 
the  others  are  added  in  such  a  way  as  to  maximize 
the  resulting  sample  j  mXf  |  values  Better  amalga¬ 
mations  might  well  exist 


Given  p  variate  observations  Yf,  /  =»  i . 

the  estimators  we  examined  are  as  follows: 

(1)  AMAL-MW: 

(a)  For  each  Yt,  create  R,  as  follows: 

Rn  =  Yim.  where  the  sample  CV  of  2.tm  is 
minimum  among  all  coordinates  of 
Ri2=  £  YtJ .  where  P'°{j  such  that 

jGP 

0)7 

Rf,  =  £  *here  *v  :o  { y  such  that 

jGS 

A/.,  <0)7 

(if  A'  is  empty,  let  A’ «  the  coordinate  of 
the  second  least  sample  CV  and  delete 
that  coordinate  from  P,  perform  analo¬ 
gous  operation  if  P  is  empty). 

(b)  Create  the  analogous  amalgamation  of  jr. 

u. 

(c)  Substitute  the  DN1WEST  estimator  of  A 
for  A  in  eqs.  (25)  and  (26).  and  the  defini¬ 
tions  of  Bf  and  Cr 

(d)  Substitute  the  sample  expectations  of  ( R,f 

-  ",  )*•  l  E  ( R„  -  ",  )’•  for  >n  (25). 
1-1 

(26)  and  the  definitions  of  R/  and  Cf. 

(e)  Solve  (25)  and  (26)  for  A  and  A  using 
w  =  (l/3.  1/3.  1/3).  resulting  in  estima¬ 
tors  A  and  A,  respectively. 

(2)  AMAL-BP:  Same  as  AMAL-MW,  except  sub¬ 

stitute  DBEST  for  DMWEST  in  step  (c) 

(3)  BCV-MW: 

(a)  Substitute  Mqf  for  mq>  in  eqs.  (25)  and 
(26)  and  the  definitions  of  B/  and  Cy 

(b)  Substitute  the  DMWEST  estimator  of  A 
for  A  in  eqs.  (25)  and  (26)  and  the  defini¬ 
tions  of  Bf  and  Cr 

(c)  Solve  eqs  (25)  and  (26)  for  A  and  A  using 
n^  =  (l  if  coordinate  j  has  least  sample 
CV,  0  otherwise),  resulting  in  estimators  A 
and  A.  respectively 

(4)  BCV-BP.  Same  as  BCV-MW,  except  sub¬ 

stitute  DBEST  for  DMWEST  in  step  (b) 


ISO 
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TABLE 2 

Estimatioa  of  sccree  coainbuitoo  pannxia 


Median  and  90S  absolute  derations  of  estimates  from  £[o}_ 


a-io 

Parameter 

Median  distance  from  £[a] 

valae 

X 

A 

Araal- 

Amal- 

BCV- 

BO'- 

MW 

BP 

MW 

BP 

4 

5 

0.1 9S 

0.199 

0.220 

0.241 

2 

4 

0346 

0304 

0.47S 

0479 

1 

5 

0676 

0663 

046S 

0311 

90*?  distance  from  £{a] 

Amal- 

Amal- 

BCV- 

BO'- 

MW 

BP 

MW 

BP 

4 

s 

0346 

0*>94 

0.745 

0.953 

2 

4 

2.06 

1.46 

0905 

0.742 

1 

5 

1.27 

1  23 

1.25 

1.17 

a ->ioo 

Parameter 

Median  distance  from  £(a] 

value 

X 

A 

Amal- 

Amal- 

BCV- 

BO’- 

MW 

BP 

MW 

BP 

4 

5 

0.041 

0.041 

0031 

0.030 

2 

4 

0.070 

0069 

0091 

00S9 

! 

S 

0105 

0111 

006S 

0061 

90?  distance  from  £(a) 

Amal- 

Amal- 

BCV- 

BCV- 

MW' 

BP 

MW 

BP 

4 

s 

0169 

0194 

0117 

0121 

2 

4 

0264 

0213 

0400 

0280 

1 

5 

0  350 

0  321 

0162 

0162 

Parameter  \alue 

2%  trimmed  mean  of  £(aj  estimates 

X  \ 

A 

£|«] 

Amal-  Amal- 

BCV- 

BCV- 

MW  BP 

MW 

BP 

4  5 

10 

02 

0070  0025 

-0047 

-0116 

4  5 

100 

02 

0  193  0  193 

>221 

0  221 

2  4 

10 

0.5 

-0066  0101 

0105 

018$ 

2  4 

100 

05 

0  518  0523 

0364 

0  551 

1  5 

10 

08 

0  347  0  236 

0  536 

0459 

1  5 

100 

08 

0772  0  833 

0789 

0  790 

In  each  case,  having  produced  estimators  A  and 
A  of  A  and  A,  we  estimate  £(a]  by 

M.  =  l-(VA) 

and  $  by 

6  =  M,A/A  +  x 

where  M,  =  ( Mn, . . . ,  MXp)'. 


Primary'  results  of  the  simulation  study  —  per¬ 
formance  of  the  estimators  as  measured  by  median 
and  90S  deviation  from  true  values  —  are  sum¬ 
marized  in  Table  2.  For  many  scenarios  the  per¬ 
formance  of  the  estimators  was  virtually  indis¬ 
tinguishable,  although  relative  performance  of  the 
BCV  estimators  to  the  AN1AL  estimators  seemed 
to  improve  as  the  estimation  scenario  worsened. 
All  of  the  functions  estimated  £[a]  reasonably 
well  in  the  case  A  =  100.  with  only  slight  decreases 
in  performance  (especially  from  the  BCV  estima¬ 
tors)  as  the  parameterization  favorability  de¬ 
creased.  Both  estimators  performed  badly  in  the 


TABLE  3 

Estimation  of  location  parameter 

Euclidean  distance  to  estimated  location  from  0 


A-10 

Parameter 

Median  distance  from 

value 

{0.005.0.1.02,04.0.23} 

X 

A 

Amal- 

Amal- 

BCV- 

BCV- 

MW 

BP 

MW 

BP 

4 

5 

007S 

0076 

0090 

0094 

2 

4 

0224 

0217 

0  19S 

0197 

» 

5 

0  312 

0  317 

0  314 

0294 

90?  distance  from  (0. 0  OS.  0 1. 0 

2.04.0  25} 

Amal- 

Amal- 

BCV- 

BCV- 

MW 

BP 

MW 

BP 

4 

5 

0.150 

0120 

0  255 

0432 

2 

4 

0  526 

0  589 

0  354 

0  512 

1 

5 

0440 

0  556 

osso 

1  15 

A -100 

Parameter 

Median  distance  from 

\alue 

{0,005.01.02.04.025} 

X 

A 

Amal- 

Amal- 

BCV- 

MCV- 

MW 

BP 

MW 

BP 

4 

5 

0020 

0020 

0020 

0020 

2 

4 

0  057 

0057 

0076 

0082 

* 

5 

0195 

0  16S 

0137 

0136 

90%  distance  from  {0. 005. 0  1, 0  2. 04, 0  25} 

Amal- 

Amal- 

BCV- 

BCV- 

MW 

BP 

MW 

BP 

4 

5 

0111 

0132 

0080 

ooso 

2 

4 

0  346 

0  333 

0410 

0430 

1 

5 

1  04 

172 

0398 

0  614 
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ease  A  =  10;  indeed,  only  in  the  most  favorable  (i) 
scenario  did  estimates  approach  being  passable, 
which  is  not  surprising  given  the  severity  of  the 
error. 

For  each  estimator  it  is  useful  to  know  not  only 
how  much  vanes  about  E[a ]  but  also  whether 
pa  tends  to  overestimate  or  underestimate  £(a]. 
Perhaos  the  most  common  measure  of  the  tend¬ 
ency  to  overestimate  or  underestimate  is  bias 
(£[£*!  -  £(<*)).  which  we  could  estimate  by  tak¬ 
ing  the  average  of  the  jim  values  generated  in  each 
100-replication  simulation  of  a  parameter  scenano 
and  subtracting  the  corresponding  values. 
The  sample  a\  c rages  of  the  jia  estimates  turned 
out  to  be  highly  unstable,  however,  invariably 
because  of  one  or  two  outlandish  observations. 
Instead,  we  give  in  Table  2  the  2%  trimmed  mean 
of  the  jim  values  —  the  average  of  the  96  central 
values  —  for  each  estimator  and  parameter 
scenano.  (In  other  words,  we  discarded  the  two 
greatest  and  the  two  least  estimates  and  took  the 
average  of  the  remaining  values.)  For  A  =  10,  the 
estimators  all  have  a  severe  tendency  to  under¬ 
estimate  E[a).  For  A  =  100.  on  the  other  hand,  the 
estimators  exhibited  only  mild  and  somewhat 


sporadic  bias.  In  fact,  at  least  one  estimator  had 
2 %  trimmed  mean  bias  less  than  0.02  in  each 
parameter  scenano.  Because  of  their  ratio  form, 
the  jim  estimators  will  be  biased  for  most  parame¬ 
ter  scenarios,  but  this  bias  does  not  appear  to  be 
serious  when  measurement  error  is  not  too  severe. 

Median  and  90%  distance  of  estimated  source 
profiles  from  the  true  source  profile,  6,  are  given 
in  Table  3.  Although  these  results  should  conform 
generally  to  results  for  estimating  £(«],  it  is  inter¬ 
esting  to  note  that  the  AMAL  estimators  per¬ 
formed  relatively  better  than  one  would  expect 
from  that  criterion  alone.  AH  estimators  reflect  the 
increasing  difficulty  of  estimation  with  worsening 
of  parameter  scenario. 

Application  to  simulated  source  apportionment  data 

Curne  ct  al.  (14)  describe  the  generation  of 
three  simulated  data  sets  which  were  made  availa¬ 
ble  to  participants  of  the  Mathematical  and  Em¬ 
pirical  Receptor  Models  Workshop  (Quail  Roost 
II)  Each  was  constructed  from  reported  source 
profiles  and  real  meteorological  data  from  St. 
Louis  over  a  40-day  period  in  1976.  We  sum- 


TABLE4 


Estimation  of  £(a) 

Quail  Roost  it  Data  Set  1 


Estimator 

Known  source 

Road 

Steel 

Coal 

Wood 

Trucxalue,  £[a) 

0172 

0002 

0063 

0102 

Estimate  ±  standard  deviation 

AMAL-MW 

0143  ±0041 

0016  ±0017 

0042  ±0026 

0  314  ±71  1 

AMAL-BP 

0  143  ±0046 

0017  ±0026 

0036  ±  00S7 

0  556  ±  33.9 

BCV-MW 

0.163  ±  0  372 

0095  ±0073 

0 12S  ±0324 

0  238  ±  0069 

BCV-BP 

0163  ±0  314 

0095  ±0073 

0122  ±  0430 

0237  ±  0066 

95%  Confidence  interval 

AMAL-MW 

(0084.0248) 

(0008.0031) 

(0023.0442) 

(0,1) 

AMAL-BP 

(0  085.0252) 

(0009,0  351) 

(0011,0122) 

(0  179. 1) 

BCV-MW 

(0,0  245) 

(0082.0  374) 

(0034,0216) 

(0  144.0  426) 

BCV-BP 

(00S2.V3J6) 

(0,0115) 

(0,0214) 

(0141.0402) 

Distance  between  estimated  location,  true  6 

AMAL-MW 

0034 

0009 

0012 

0262 

AMAL-BP 

0034 

0008 

0013 

0886 

BCV-MW 

0025 

0034 

0032 

0146 

BCV-BP 

0025 

0034 

0030 

0144 
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fig.  2  Quail  roost  data.  ‘Ambient’  proportional  data  are 
represented  by  circles,  with  silicon  component  plotted  against 
a  carbon  component  Squares  represent  corresponding  values 
for  proportional  profiles  of  known  sources. 

marize  here  the  performance  of  our  moments 
estimators  on  the  first  of  the  data  sets,  which  was 
based  on  eight  source  profiles  and  observations 
contaminated  by  normal  error.  For  each  of  the 
source  profiles  coal.  road,  steel,  and  wood,  we 
fixed  one  profile  as  ‘known’  and  attempted  to 
estimate  its  influence  with  respect  to  the  rest, 
which  were  aggregated  as  desenbed  below  and 
treated  as  a  single  unknown.  Since  the  ‘unknown’ 
is  really  known,  we  can  test  how  well  our  method¬ 
ology  estimates  it.  Eighteen  chemical  species  were 
used  to  define  proportions  —  all  of  the  species 
from  which  profiles  were  constructed  in  ref.  14 
with  the  exception  of  As  and  CC  (contemporary 
carbon) 

Results  aiw  summanzed  in  Table  4.  Standard 
errors  and  confidence  intervals  were  determined 
by  bootstrap  methods  (1000  resampling  repli¬ 
cations)  described  by  Efron  [15,16]  Actual  values 
of  source  contributions  and,  therefore,  of  the  a, 
values  are  given  in  ref  14;  £(a)  is  taken  to  be  the 
sample  average  of  the  a,  values.  0  is  obviously 
not,  in  fact,  a  constant  parameter  However,  using 
actual  source  contributions,  one  may  calculate  a 
composite  source  profile  for  each  day  The  ‘actual 
value*  of  0  is  taken  to  be  the  average  of  the  daily 
composite  profiles. 

Estimation  of  £[a]  ranged  from  excellent  in 
the  best  case  (‘road’)  to  poor  in  the  worst  case 
(‘wood’),  with  reasonable  estimates  resulting  in 


the  other  two  cases.  The  algorithm  does,  in  fact, 
appear  to  estimate  the  average  composite  profile 
as  the  location  parameter,  6.  Fig.  2  may  cast  some 
light  on  the  behavior  of  the  estimates.  The  road 
parameter  is  on  the  ‘edge’  of  and  in  line  with  the 
bulk  of  the  data,  almost  as  if  it  were  one  of  two 
contributing  sources.  Steel  and  coal,  in  the  center 
of  the  data,  appear  to  be  ‘in  between*  other 
sources,  and  neither  is  in  line  with  the  data.  One 
would  not  expect  to  estimate  either  one  as  well  as 
road.  Wood,  finally,  is  extremely  far  from  the 
observed  data,  which  would  certainly  be  expected 
to  cause  problems.  Estimates  of  ‘A’  also  shed  some 
light  on  the  situation;  for  road  and  steel,  all  esti¬ 
mates  were  large  and  stable  (>  1000  in  the  case  of 
road).  In  the  case  of  wood,  especially,  estimates 
were  unstable,  perhaps  an  indication  that  model 
assumptions  are  in  severe  violation  (Recall  that 
the  simulated  errors  are  additive  and  Gaussian 
rather  than  from  our  Dmchlet  model.)  It  is  reas¬ 
suring  to  notice  that  bootstrap  standard  errors 
and  intervals  identify  the  poor  estimators  as  being 
unreliable. 

In  addition,  we  attempted  to  estimate  tradi¬ 
tional  GMB  parameters  using  the  principles  out¬ 
lined  in  the  section  on  the  CMB  model,  above. 
The  most  simple  transformation  to  the  CMB  model 
may  be  earned  out  by  substituting  observed  am¬ 
bient  mass  profiles,  S„  for  sy  and  an  estimated 
source  profile,  $,  for  6  above  and  solving  the 
appropriate  equations  When  the  dimension  of  the 
observations  is  greater  than  the  number  of  sources, 
as  is  the  case  here,  one  may  select  a  subset  of  the 
eqs  (5)  and  (6)  to  determine  the  parameters  In  an 
attempt  to  base  as  many  equations  on  ‘known’ 
data  as  possible,  we  chose  to  use  all  of  the  eqs  (5) 
and  all  of  the  eqs  (6)  based  on  the  ‘known’  source 
profile.  Only  one  equation  remained  to  identify 
the  parameters,  we  chose  the  equation  from  (6) 
based  on  the  components  of  the  unknown  profile 
for  which  observed  CV  was  smallest  and  second- 
smallest  Using  this  method,  we  were  able  to 
estimate  the  total  source  contributions  for  ‘road’ 
quite  adequately,  indeed,  with  one  exception,  we 
were  able  to  estimate  contributions  to  within  a 
factor  of  2  whenever  road  accounted  for  more 
than  0.6%  of  the  total  mass.  (The  exception  was 
within  a  factor  of  3,  and  estimated  values  were 
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generally  much  closer  to  true  values  than  a  factor 
of  2  when  road  accounted  for  more  than  5%  of 
total  mass.)  Estimation  of  total  source  contribu¬ 
tions  for  the  other  profiles  was  much  less  success¬ 
ful.  The  fact  that  the  composite  of  the  remaining 
sources  behaved  very  much  like  a  single,  second 
source  in  the  case  of  road  whereas  it  did  not  in  the 
other  cases  accounts  for  much  of  this  effect.  How¬ 
ever,  estimation  of  total  source  contributions  in 
this  manner  wall  be  difficult  whenever  measure¬ 
ment  error  is  severe  enough  to  push  a  sizeable 
number  of  the  observations  ‘beyond’  the  profiles 
x  and  9  in  the  sense  described  in  the  section  on 
nonparametric  models  with  measurement  error, 
above. 


CONCLUSION 

Limitations  of  standard  CMB  models  led  us  to 
introduce  SASU  models  —  source  apportionment 
with  one  source  unknown  In  this  paper,  we  have 
considered  the  case  of  one  source  known  and  one 
source  unknown.  Inherent  to  this  situation  are  at 
least  two  interesting  statistical  problems  estima¬ 
tion  of  a  structural  parameter  in  the  presence  of 
infinitely  many  incidental  parameters  and  estima¬ 
tion  of  a  parameter  which  is  not,  in  general,  iden¬ 
tifiable.  The  latter  problem  is  easily  addressed  in 
the  case  of  no  measurement  error  by  requiring 
that  the  unknown  source  is  a  support  boundary 
(which  is  eventually  attained)  of  the  observation 
distribution  In  the  case  of  measurement  error,  it 
would  appear  that  deconvolution  methods  are  re¬ 
quired  in  order  to  identify  the  unknown  source  in 
a  completely  nonparametric  model.  We  have  be¬ 
gun  research  in  this  area,  but  more  work  is  needed 
before  making  recommendations. 

Parametric  models  may  present  a  reasonable, 
practical  alternative  to  the  nonparametric  ap¬ 
proach  The  Dinchlet  model  examined  appears 
promising,  as  an  added  benefit,  it  is  easily  gener¬ 
al  izable  to  the  case  when  there  is  more  than  one 
known  source  A  number  of  issues  need  to  be 
considered,  however  Given  model  (14),  the  source 
contribution  parameters  X  and  A  not  only  iden¬ 
tify  £[a)  but  all  higher  moments  and,  indeed,  the 
exact  shape  of  the  distribution  of  a.  The  role  of 


the  individual  parameters  X  and  A  if  observations 
do  not  satisfy  eq.  (14)  is  unclear.  For  the  Quail 
Roost  data,  the  magnitudes  of  best-CV  estimates 
of  X  and  A  appeared  reasonable,  but  amalgama¬ 
tion  estimators  seemed  to  underestimate  the  mag¬ 
nitude  quite  severely.  Research  into  this  phenome¬ 
non  is  necessary.  Study  of  sensitivity  to  model 
assumptions  is  needed,  in  general.  Modifications 
of  the  model  may  be  warranted  —  for  example, 
allowing  A  to  vary  either  with  time  or  with  chem¬ 
ical  species.  Alternat.  to  moments  estimates, 
such  as  maximum  likelihood  estimates,  should  be 
available  given  enough  computing  power.  How¬ 
ever,  computation  of  maximum  likelihood  estima¬ 
tors  requires  accurate  estimators  as  starting  values, 
so  the  method  of  moments  estimators  should  be 
useful  even  if  maximum  likelihood  estimators 
prove  to  be  superior.  Finally,  other  parametric 
models  should  be  investigated. 

The  question  of  how  best  to  transform  from  the 
SASU  model  to  the  standard  CMB  model  when 
enough  data  are  available  to  do  so  remains  open. 
One  may  always  substitute  observed  ambient  air 
mass  profiles,  S„  for  s,  and  an  estimated  source 
profile,  9 ,  for  9.  However,  the  presence  of  mea¬ 
surement  error  guarantees  that  the  resulting  esti¬ 
mates  will  not  be  consistent.  We  will  continue  to 
investigate  this  question. 

A  complete  approach  to  the  SASU  problem 
will  eventually  require  investigation  of  numerous 
complications  to  the  model,  including  the  case 
when  the  xk  values  are  measured  with  error  and 
the  case  when  observations  are  correlated.  One 
might  like  to  include  observable  covariates  such  as 
weather  or  seasonal  variables  in  a  reasonable 
model.  Estimation  m  the  case  of  more  than  one 
known  source  presents  an  interesting  problem  as 
well.  While  some  analog  of  eq.  (9)  is  probably 
necessary  in  order  to  identify  the  problem  (a  Di¬ 
nchlet  model  imposes  eq  (9)  naturally],  the  geo- 
metnc  nature  of  the  problem  is  somewhat  differ¬ 
ent  than  in  the  one-source-known  case  Research 
into  these  issues  is  underway. 

When  the  unknown  ‘source’  is  actually  an  ag¬ 
gregate  of  several  unknown  sources,  then  it  is 
questionable  whether  one  should  model  its  profile 
9  as  fixed  Instead,  one  might  model  $n  the  un¬ 
known  profile  at  time  /,  as  a  stochastic  process, 
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cither  stationary  or  with  a  time  trend  depending 
upon  the  nature  of  the  unknown  sources.  In  some 
situations  it  would  be  sensible  to  model  0t  as 
depending  on  a  covariate. 
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An  initial  remark  we  would  like  to  make  is  to 
note  the  interest  m  the  receptor  modeling  problem 
by  statisticians.  This  paper  along  with  the  one 
elsewhere  in  these  proceedings  by  L.  Gleser  pro¬ 
vide  some  of  the  first  efforts  to  explore  the  recep¬ 
tor  modeling  problem  as  a  statistical  problem.  We 
think  that  there  are  a  number  of  interesting  aspects 
to  this  particular  form  of  the  mixture  resolution 
problem  because  of  the  lack  of  constancy  in  the 
source  profiles  and  the  errors  in  the  sampling  and 
analyses  that  make  receptor  modeling  different 
from  mixture  resolution  using  spectrometric  data 
Thus  we  welcome  more  statistical  inputs  and  in¬ 
sights  into  the  exploration  of  sources  of  airborne 
pollutants. 

The  next  aspect  of  this  paper  that  needs  to  be 
discussed  is  that  of  facilitated  communication.  It 
is  clear  from  the  paper  that  receptor  modelers 
have  not  defined  their  terminology  sufficiently 
clearly  such  that  people  entering  the  field  can 
immediately  adopt  our  jargon.  The  paper  suggests 
that  the  problem  they  are  solving  is  that  of  the 
chemical  mass  balance  (CMB)  However,  as  this 
term  is  commonly  used  within  the  receptor  model¬ 
ing  community,  it  refers  to  the  resolution  of  a 
single  sample  into  its  components  based  on  a  set 
of  source  profiles  that  are  known  a  priori  In  the 
approach  outlined  here,  a  number  of  samples  are 
used  to  deduce  the  profile  of  the  ‘  unknown’  source 
when  one  or  more  profiles  are  known  and  then 
obtain  the  mass  contributions  of  the  known 
sources  This  method  requiring  multiple  samples 


then  falls  into  the  multivariate  methods  category 
as  outlined  by  Cooper  and  Watson  [1]  As  such,  it 
seems  that  this  new  method  should  be  compared 
with  other  methods  that  attempt  to  deduce  source 
profiles  including  absolute  principal  components 
analysis  {2J,  target  transformation  factor  analysis 
(TTFA)  (3]  and  SAFER  (4] 

The  model  presented  in  this  paper  suffers  from 
the  need  for  a  basic  assumption  that  the  ‘un¬ 
known’  source  is  constant  in  composition  How¬ 
ever,  if  the  ‘unknown’  source  is  really  a  combina¬ 
tion  of  sources,  then  it  is  unlikely  that  this  as¬ 
sumption  will  be  valid  In  a  complex,  urban 
airshed,  wind  direction  shifts  can  drastically  alter 
the  number  and  types  of  sources  (5)  and  even  at 
more  remote  sites,  there  can  be  highly  significant 
seasonal  variations  m  composition  of  emissions 
from  various  sources  so  that  the  applicability  and 
utility  of  this  approach  relative  to  the  traditional 
multivariate  approaches  is  not  at  all  clear. 

Before  getting  into  other  more  detailed  com¬ 
ments  on  the  source  apportionment  with  one 
source  unknown  (SASU)  methods,  we  would  like 
to  raise  some  other  .ssues  regarding  communica¬ 
tions.  This  paper  is  written  by  statisticians  for 
statisticians  and  has  therefore  been  written  in  ‘sta¬ 
tistics’.  However,  for  us  armchair  statisticians,  it 
becomes  very  difficult  to  read  and  digest  because 
we  first  have  to  translate  it  from  symbolic  nota¬ 
tions  into  terms  we  can  follow.  We  realize  that  this 
paper  takes  advantage  of  commonly  (for  the  sta¬ 
tistics  literature)  used  symbols  such  as  c ,  V,  and 
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[0,  1]  We  would  suspect  that  most  readers  of  this 
journal  are  not  going  to  be  able  to  easily  follow 
the  arguments  because  they  get  lost  in  the  sym¬ 
bols,  and  we  suggest  that  although  it  is  cumber¬ 
some  to  do  so,  these  symbols  should  generally  be 
avoided  in  papers  that  are  written  for  non-statisti- 
cians  to  read. 

We  also  would  urge  that  theorems,  propositions 
and  the  likes  be  relegated  to  Appendices  rather 
than  breaking  the  flow  of  the  reasoning  in  the 
text.  We  recognize  the  heresy  of  this  proposal,  but 
offer  it  notwithstanding  in  order  to  improve  com¬ 
munications  to  the  non-statistician 

There  are  a  number  of  other  aspects  of  this 
paper  that  we  would  like  to  discuss  The  authors 
suggest  the  CMB  model  cannot  deal  with  time 
varying  source  contributions.  CMB  analysis  does 
not  deal  with  time  variation  at  all  because  it  is 
performed  on  only  one  sample.  Time  variations  in 
source  contributions  ».ould  only  be  found  by  per¬ 
forming  a  senes  of  CMB  analyses  on  a  sequence 
of  samples.  Time  vanation  m  the  source  profiles  is 
normally  not  incorporated  because  multiple  source 
samples  are  not  often  taken  at  the  same  time  as 
the  ambient  samples  However,  only  the  financial 
and  access  constraint  that  often  plague  field  stud¬ 
ies  preclude  the  incorporation  of  time  vanation  of 
the  source  profiles  in  the  CMB  calculations.  An 
alternative  approach  to  incorporate  systematic 
time  variations  would  be  to  use  Kalman  filtcnng 
It  would  appear  feasible  to  utilize  this  method  to 
take  such  time  variation  into  account.  Although  it 
has  not  yet  been  studied  in  the  context  of  the 
receptor  modeling  problem,  the  Kalman  filter  ap¬ 
pears  to  be  a  method  worthy  of  further  explora¬ 
tion. 

There  is  a  statement  that  the  use  of  additive, 
Gaussian  error  structures  may  be  a  limitation  to  a 
CMB  analysis  because  the  observations  should  be 
non-negative  and  may  be  constrained  as  when  the 
measurements  are  proportions  summing  to  one 
One  of  the  continuing  problems  in  air  quality  data 
handling  is  that  of  the  compulsion  to  left  truncate 
data  Most  of  our  chemical  analytical  methods 
have  demonstrably  symmetric  error  bands  on  the 
results  even  if  the  errors  are  not  truly  Gaussian 
For  many  airborne  particle  analyses  based  on 
photon  spectroscopy  such  as  neutron  activation  or 


X-ray  fluorescence,  we  know  that  the  count  data 
on  which  the  concentrations  are  determined  have 
a  Poisson  distribution  and  the  additivity  of  the 
uncertainties  can  be  explicitly  calculated.  Thus,  it 
is  certainly  possible  that  if  the  sample  does  not 
contain  the  analyte  of  interest,  a  measured  value 
less  than  zero  is  a  valid  result.  Too  many  people 
will  then  set  the  value  to  zero  because  of  their 
misunderstanding  of  the  effects  of  the  measure¬ 
ment  error.  Tlius,  some  of  the  starting  premises  of 
this  work  seem  to  be  in  error. 

In  the  non-parametnc  model,  they  suggest  that 
in  the  limit  of  sufficiently  large  numbers  of  sam¬ 
ples  being  taken  and  analyzed,  there  will  be  one 
that  will  be  composed  almost  entirely  of  the  species 
contributed  by  the  ‘unknown’  source  This  as¬ 
sumption  again  raises  the  problem  of  the  con¬ 
stancy  of  the  mixture  of  unknown  sources  that 
constitute  the  ‘unknown’  source  Although  the 
wood  stove  would  not  be  burned  in  the  summer, 
there  may  be  other  sources  that  are  on  in  the 
summer  but  not  m  the  winter.  The  real  situation  is 
not  likely  to  be  as  simple  as  portrayed  here. 

It  also  appears  that  it  is  necessary  to  know  the 
probability  distribution  of  the  ‘unknown’  source 
contnbutions  G(«).  It  has  not  yet  been  done  for 
any  source  to  the  extent  that  the  distribution  of 
values  is  known  Thus,  at  this  time,  this  approach 
does  not  appear  to  provide  practical  help  to  the 
receptor  modeler  particularly  in  light  of  the  other 
problems  that  arise  when  measurement  error  is 
introduced  into  the  model. 

One  of  the  problems  with  the  use  of  propor¬ 
tional  data  is  that  ultimately  the  results  will  need 
to  be  back  transformed  into  absolute  concentra¬ 
tions  (/xg/m3)  to  be  used  by  air  quality  managers. 
It  will  be  necessary  to  provide  a  method  to  give 
such  values  with  associated  error  bounds  if  the 
method  is  to  be  applied  to  real  air  quality  manage¬ 
ment  problems 

In  the  parametric  model  studies,  the  stimulated 
data  were  assumed  to  have  identical  and  constant 
errors  for  all  chemical  species  from  all  of  the 
sources  The  authors  note  this  is  unrealistic  We 
would  encourage  further  study  with  more  realistic 
error  structures  so  that  any  possible  points  at 
which  the  analysis  shows  problems  can  be  identi¬ 
fied. 
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Finally  m  the  analysis  of  the  Quail  Roost  II 
data  set,  it  is  interesting  that  SASU  was  able  to 
estimate  the  STEEL  source  even  though  it  was 
well  below  the  ‘detection  limits’  as  defined  by 
Curne  et  al.  (6)  It  seems  surprising  that  “WOOD” 
was  so  poorly  estimated  as  it  could  be  found 
relatively  well  using  the  other  multivariate  meth¬ 
ods  {6]  It  would  be  interesting  to  know  if  the 
choices  of  Si  and  C  are  unique  in  showing  the 
results  presented  in  Fig.  2  or  whether  there  are 
other  pairs  of  variables  that  show  the  same  pat¬ 
tern.  The  results  on  the  Quail  Roost  data  also 
suggest  that  the  Dirichlet  distribution  was  not  a 
very  good  representation  of  the  needed  distribu¬ 
tion  for  these  data  sets.  Since  these  sets  are  based 
on  a  reasonably  realistic  data  generation  model,  it 
suggests  that  there  is  a  need  to  explore  other 
distributions  beside  the  Dinchlet  to  find  one  that 
better  represents  air  quality  data  distributions 
In  conclusion,  we  welcome  the  increased  input 
of  statisticians  into  the  receptor  modeling  field 
We  hope  that  we  can  open  lines  of  communica¬ 
tions  so  ‘hat  the  problems  examined  can  better 
relate  to  actual  receptor  modeling  problems  and 
ask  the  indulgence  of  the  statistics  community  to 
be  patient  with  those  of  us  who  are  not  fluent  in 
symbolic  logic  symbols  and  thus  find  great  diffi¬ 


culty  in  reading  and  understanding  the  work  that 
is  being  presented 
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A  number  of  mathematical  approaches  that  are  currently  of  interest  in  theoretical  combustion  are  briefly  described  These  arc  (i) 
activation  energy  asymptotics  —  flame-sheets  and  hot-spots,  <2>  bifurcations  and  routes  to  chaos,  (3)  turbulent  premixed  flames  — 
fractals  and  renormalization.  (4)  reduced  chemistry  and  rate-ratio  asymptotics,  (5)  nonlinear  high-frequency  acoustics  and 
combustion 


PROLOGUE 

With  rare  exceptions,  combustion  is  fluid 
mechanics  with  the  addition  of  highly  exothermic, 
temperature-sensitive  chemical  reaction  Progress 
in  combustion  theory  has  therefore  been  closely 
linked  to  tools  that  have  been  developed  to  deal 
with  the  reaction  terms,  and  this  is  apparent  in  the 
topics  discussed  here.  Section  1  briefly  describes  a 
successful  asymptotic  treatment  based  on  the  idea 
of  extreme  sensitivity  of  the  reaction  rate  to  tem¬ 
perature  variations  This  can  lead  to  flamcshect 
models  m  which  reaction  is  confined  to  thin  layers, 
and  this  provides  a  powerful  tool  for  examining 
flame  stability,  the  subject  of  Section  2.  At  high 
Reynolds  numbers  the  role  of  chemistry  is  reduced 
to  generating  a  hydrodynamic  flame,  a  tempera¬ 
ture  and  density  discontinuity  separating  two  in- 
viscid  flow  fields  (Section  3).  More  subtle  aspects 
of  the  chemical  kinetics  play  a  role  in  Section  4, 
which  describes  a  rational  procedure  for  reducing 


complex  kinetic  systems  to  reduced  sets  involving 
three  or  four  reaction  steps.  Our  discussion  con¬ 
cludes  in  Section  5  with  the  interaction  of  high 
frequency  acoustic  waves  and  a  combustion  field. 
Of  particular  interest  is  the  fact  that  a  small-am¬ 
plitude  nonlinear  periodic  wavetrain  can  accel¬ 
erate  a  temperature-sensitive  reaction 


1  ACTIVATION  ENERGY  ASYMPTOTICS  -  FLAME- 
SHE&rS  AND  HOT-SPOTS 

It  is  commonplace  in  combustion  theory  to 
adopt  a  simple  one-step  kinetic  model  char¬ 
acterized  by  Arrhenius  kinetics  For  premixed 
flames  this  might  have  the  form 

mixture  -*  products 
at  a  rate 

a  =  DYe~,/r  (1.1) 
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Fig  1  Flame-sheet  separating  two  regions  of  frozen  flow.  This 
is  typical  of  the  structure  seen  in  diffusion  flames  (1) 

where  Y  is  the  mixture  fraction,  T  the  tempera¬ 
ture;  for  diffusion  flames, 

fuel  +  oxygen  -*  products 
at  a  rate 

SI  =  DXYe~e/T  (1  2) 

where  X  (X)  is  the  oxygen  (fuel)  mass  fraction  6 
is  a  nondimensional  activation  energy  or  activa¬ 
tion  temperature. 

Asymptotic  treatments  are  possible  in  the  limit 
0  -*  co  and  have  proven  to  be  of  great  value  m 
elucidating  a  wide  range  of  combustion  phenom¬ 
ena  (1-3]  For  some  problems  the  asymptotics 
lead  to  flame-sheets,  thin  regions  in  which  there  is 
a  balance  between  diffusion  and  reaction;  beyond 
the  flame-sheet  reaction  is  negligible.  This  comes 
about  by  considering  the  distinguished  limit 

D->  oo,  0  —*  co,  D**e$/T*>  T*  fixed  (1.3) 

This  immediately  leads  to  a  partition  of  the  flow- 
field  into  regions  where  T<T*  so  that  U  ->  0 
(frozen  chemistry),  and  regions  where  T>T*  so 
that  Y  0  or  XY  -*  0  (equilibrium  chemistry), 
and  again  fl  -*  0  for  the  irreversible  kinetics  of 
cqs.  (1.1)  and  (1.2)  *. 

The  thin  reaction  zone  or  flame-sheet  is  char¬ 
acterized  by 

7* +0(1/0)  (1.4) 


In  a  special  but  important  case,  the  plane  deflagration,  an 
unbounded  region  of  equilibrium  gas  exists  where  T-  T*. 


e  g.  Fig  1.  These  flame-sheet  structures  are  well 
understood  and  the  approach  is  a  well  established 
and  proven  tool 

A  quite  different  class  of  problems  involves 
hot-spot  formation  and  ignition  Consider  the  fol¬ 
lowing  simple  model  for  homogeneous  thermal 


ignition. 

dT/d;  =  e->/T,  T(0)  =  T0  (1.5) 

Adopting  the  ansatz 

r-r0(i  +  i<5+„.)  (i.6) 

the  perturbation  function  satisfies  the  initial- 
value  problem 

d^/dr  =  e*,  £(0)~0  (1.7) 

where  t  is  a  scaled  time  Tim  has  solution 

—  ln(l  - 7)  (1.8) 


valid  for  0<t<1.  Thermal  runaway  occurs  at 
r=  1.  In  nonhomogeneous  problems,  runaway  is 
confined  to  a  small  region  called  a  hot-spot.  A 
well-known  example  occurs  in  a  certain  type  of 
deflagration-to-detonation  transition  [4]  A  weak 
shock  is  generated  by  the  accelerating  flame,  and 
m  the  shocked  gas  a  hot-spot  forms  and  gives  rise 
to  an  expanding  shock  which  interacts  with,  and 
reinforces,  the  lead  shock  (Fig.  2). 


/ 

L _ ^ 

L 

Fig  2  Hot-spot  formation  and  initiation  of  a  shock  in  de- 
flagration-todctonation  transition  (cartoon  based  on  plate  5  of 
ref  4) 
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Fig.  3  Pressure  distribution  at  different  times  in  an  interior 
hot-spot.  From  ref  5  with  permission 


Only  recently  have  these  hot-spots  been 
analyzed  for  a  compressible  gas,  and  Fig.  3  shows 
the  early  pressure  rise  for  an  interior  hot-spot  one 
not  next  to  a  wall)  (5).  Density  changes  are  shown 
in  Fig.  4;  the  process  is  so  rapid  that  no  signifi¬ 
cant  mass  flux  can  occur,  and  these  changes  arc 
small  (inertial  confinement).  Recent  efforts  have 
been  concerned  with  the  consequences  of  hot-spot 
formation  (6,7|. 


Fig.  4  Density  distribution  at  different  .imes  in  an  interior 
hot-spot  From  ref  5  with  permission 


2  BIFURCATIONS  AND  ROUTES  TO  CHAOS 

The  constant  density  model  for  premixed  flames 
can  take  the  form  (see  ref.  2,  p.  25): 

y-o  (2.1) 

with  ft  given  by  eq.  (1.1).  Here  Le  is  the  Lewis 
number  and  values  of  Le  different  from  1  can  give 
rise  to  Turing  instabilities  [8]. 

As  noted  in  Section  1,  in  the  limit  8  -*  00 
reaction  is  confined  to  a  thin  flame  sheet.  Indeed, 
for  deflagrations  that  are  nominally  plane  and 
adiabatic,  ft  behaves  like  a  Dirac  5- fund  ion  of 
strength  ~  e~9/2T *  where  T*  is  the  flame  tem¬ 
perature.  It  is  then  not  difficult  to  construct  a 
stationary  solution  (unchanging  flame  propa¬ 
gation),  whose  linear  stability  can  be  explored 
using  a  modal  analysis  If  the  flame-sheet  dis¬ 
placement  is 

x,  -  - 1 V^t  +  «*•*'»',  c  -  0  (2.2) 

where  the  unperturbed  flame  propagates  to  the 
left  at  the  adiabatic  flame  speed,  the  stability 
diagram  Fig.  5  can  be  constructed  (9). 

In  the  neighborhood  of  P  long  wavelength 
disturbances  grow  very  slowly  and  weak  nonlin- 
eanties  can  be  incorporated  into  the  analysis  by 
means  of  a  bifurcation  analysis.  In  this  way  the 
Kuramoto-Sivashmsky  equation  can  be  derived 
(10J  for  $~xt  +  JFadr,  and  when  corrugations  in 
the  z  direction  are  also  admitted  this  is 

<>,  +  i(v«)!--vV-4vV  (2  3) 


Fig  5  Stability  boundaries  in  the  wavenumber- scaled  Lewis 
number  plane 
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The  first  term  on  the  right  is  viscous-hke  with  a 
negative  viscosity  coefficient  and  is  strongly  de¬ 
stabilizing;  *iie  v4  term  stabilizes  short  waves. 
Numerical  simulations  show  that  the  flame-sheet 
adopts  an  irregular,  unsteady,  cellular  configura¬ 
tion  (Fig.  6).  Physical  flames  in  mixtures  with 
Le  <  1  can  display  similar  behavior  (Fig.  7)  (11) 
(see  also  ref.  1,  p.  194). 

Fig.  5  shows  the  stability  boundaries  for  an 
unbounded  flame.  If  we  consider  flames  that  are 
attached  to  burners,  accounting  for  the  heat  flux 
to  the  burner,  the  left  stability  boundary  is  mod¬ 
ified  (Fig.  8).  If  at  the  same  lime  the  burner 
geometry  restricts  the  wave  number  k  to  discrete 
values,  discrete  points  on  this  boundary  are  de¬ 
fined,  each  of  which  is  a  potential  bifurcation 
point  from  which  can  spring  a  nonplanar  solution. 
These  various  solutions  can  interact  (e.g,  bimodal 
bifurcations)  and  display  interesting  dynamical 
behavior.  Analysis  (12-14)  can  explain  the  behav¬ 
ior  of  polyhedral  flames,  multiple-sided  flames 
sometimes  seen  on  Bunsen  burners  (Fig.  9).  These 
are  sometimes  stationary,  sometimes  they  spin, 
and  the  number  of  sides  can  be  changed  by  vary¬ 
ing  the  combustion  parameters  (mass  flow-rate, 
mixture  strength). 

The  right  stability  boundary  of  Fig.  5  is  rela¬ 
tively  inaccessible  to  physical  mixtures  but  has  a 
counterpart  in  the  analysis  of  thermites,  which  are 
solids  that  burn  to  form  soliJs  and  so  have  Le  » 
eo.  In  the  k-0  plane  (0  is  no  longer  asymptoti¬ 
cally  large  (15)),  and  again  with  k  restricted  to 
discrete  values,  possible  bifurcation  points  arc 
identified  in  Fig.  10). 


Fig.  7  Cellular  flames,  courtesy  of  M.  Gorman  (11).  An  optical 
illusion  can  make  these  look  like  liquid  drops  on  the  underside 
of  a  plate,  wuh  the  white  regions  corresponding  to  convex 
surfaces.  In  fad  these  are  lop  views  of  the  flame  with  concave 
or  cup- like  white  regions,  each  cup  being  surrounded  by  a 
multiple-sided  sharp  ndge  As  with  many  optical  illusions, 
persistence  will  cause  the  image  to  'flip' 


Fig  8  Modification  of  the  left  stability  boundary  of  Fig.  5  by 
heat  losses,  showing  possible  bifurcation  points  when  k  •> 
restricted  to  discrete  values 
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Fig  9  Pol>hedral  tlame  A  cartoon  based  on  a  photograph  in 
ref  1 

Fig  10.  Possible  bifurcation  points  corresponding  to  planar 
corrugations  of  thermite  flames.  A  similar  figure  can  be  con* 
structed  for  cylindrical  geometry 

from  those  of  Fig  11  and  apparently  displays 
chaotic  behavior. 

A  rich  dynamic  structure  is  associated  with 
bifurcations  from  the  right  stability  boundaries 
(16-18).  Fig.  11  shows  variations  of  the  flame 
speed  with  time  for  a  problem  discussed  in  ref.  18 
and  exhibits  2  -  T  periodic  behavior.  Fig.  12  cor¬ 
responds  to  slightly  different  parameter  values 

3  TORBULENT  PREMIXED  FLAMES  -  FRACTALS  AND 
RENORMALIZATION 

Fig.  13  shows  premixed  flame  images  obtained 
in  a  laboratory  engine  at  Princeton  University 

lambda  a  5  59500 

00  t$  SO  JA  too  US  ISA  tTJ  JOO  K.S  2SA  ITA  JOO  »A  MA  >7A  400 

TIME 

Fig  11  Flame  front  v  locuy  vs.  lime  in  thermite  burning.  From  ref  18  with  permission  This  displays  2T  periodic  behavior 


00  200  400  (0  0  90  0  fOOO  1200  1400 

TIME 


190  0  200  0  2200  240  0  290  0 


Fig  12  Flame  front  velocity  vs.  lime  in  thermite  burning.  From  ref  18  with  permission  This  appears  to  display  chaotic  behavior 


(19,20};  the  flame  is  the  boundary  between  the 
products  (white)  and  the  reactants  (black)  These 
images  are  typical  of  turbulent  flames,  and  one 
may  ask  whether  or  not  the  flame  is  a  fractal 


surface.  To  answer  this  question  it  is  necessary  to 
measure  the  surface  area  using  ‘rulers’  of  different 
size,  plotting  the  area  vs.  the  ‘ruler’  length  on  a 
log-log  plot  (Fig.  14).  Between  large*  and  small- 
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Fig.  13,  Flame  images  in  an  internal  combustion  engine  at  2400  rpm.  From  ref.  19  with  permission  The  equivalence  ratios  are  09, 
0$,  07  (top  to  bottom). 
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scale  cuioffs,  a  fractal  surface  is  characterized  by  a 
straight  line  of  slope  (2  —  D),  2  <  D  <  3,  where  D 
is  the  fractal  dimension.  Note  that  fractal  behavior 
is  only  observed  over  a  1-decade  range  of  length 
scales,  and  this  may  be  too  small  for  the  concept 
to  be  of  value. 

Turbulent  flames  travel  faster  (ll',„,b)  than 
laminar  flames  (H^)  because  of  the  enhanced 
average  burning  area  generated  by  the  wrinkling. 
Discarding  other  effects  (eg.  flame-stretch,  ref.  1, 
p.  146), 


HU  Ao 


(3.1) 


(see  Fig.  14),  and  Gouldtn  (22)  has  used  this  idea 
to  predict  turbulent  flame  speed  as  a  function  of 
turbulent  intensity.  Fig.  15  shows  some  of  his 
results  For  other  mixtures  the  agreement  is  not  as 
good;  moreover  Gouldin’s  choice  of  A,  (the 
Kolmogoroff  length  scale)  has  been  questioned 
[231.  Nevertheless,  the  agreement  is  encouraging. 

Some  related  mathematical  treatments  have 
dealt  with  the  kinematic  flame  equation 


^  +  (ff-V)C-»'lM.|VG|  (3.2) 

which  governs  a  scalar  function  G(x,  t)  where  the 
surface  G  «  0  represents  the  flame.  This  surface  is 
convcctcd  by  the  flow  field  v  and  propagates 
relative  to  the  fluid  at  the  laminar  flame  speed. 
Given  a  turbulent  flow  v  we  can  ask  what  turbu- 


Fig,  14.  Area  vs.  scale  for  a  fractal  surface.  The  data  points  are 
obtained  from  ref.  21.  corresponding  to  a  tube-burner  flame. 


Fig  15  Turbulent  flame-speed  vs  turbulent  intensity  From 
ref  22  with  permission 


lent  flame  speed  will  be  predicted  by  this  equation 
(24-26). 

The  turbulent  field  is  characterized  by  a  wide 
range  of  scales  {/)  where  l0>l>  I,  (ouler  and 
inner  cut-offs),  and  v,c  define 

<?(/,) =<<?(■*.'»/,  (3'3) 

the  average  of  G  over  all  length  scales  lt  >  l>  /,• 
Similarity  on  the  different  length  scales  implies 
that 

+  (*(/,)  -  v )G(/,)  -  KM)  I $<?(/,)  I 

(3.4) 

where  ff(ulb(/|)  is  a  'partial'  turbulent  flame-speed 
associated  with  wrinkling  on  the  scales  smaller 
than  /,.  By  definition 


1  K«M)  ~  "V  as  '<  -*  'o 


(3-5) 


so  that  IFtulb  can  be  calculated  if  the  averaging 
procedure  leading  to  cq.  (3.4)  can  be  carried  out. 
Existing  analyses  yield  (ibid.) 


(3.6) 
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Fig  16  Calculated  structure  of  a  wet  CO  flame  using  the 
complete  mechanism  and  the  short  mechanism  From  ref  27 
with  permission 


or 

f  U  ],/2 

In^j  (3  7) 


4  REDUCED  CHEMISTRY  AND  RATE-RATIO  ASYMP¬ 
TOTICS 

The  chemistry  of  physical  flames  is  extremely 
complicated,  presenting  an  insurmountable  ob¬ 
stacle  to  analysis  and  a  severe  challenge  to 
numerical  simulations  unless  substantial  simplifi¬ 
cations  are  introduced.  Consider,  for  example,  wet 
CO  flames  (27).  A  complete  description  of  the 
Kinetics  involves  67  steps  with  rates  characterized 
by  162  nonzero  parameters  and  a  commensurate 
number  of  reactants.  Even  after  unimportant  reac¬ 
tions  arc  discarded,  21  steps  remain  governing  10 
species.  (The  accuracy  of  such  short  mechanisms 
can  be  checked  by  comparing  the  flame  structures 
they  yield  with  exact  calculations,  Fig.  16).  Clearly 
additional  simplification  is  necessary  and  two  sim¬ 
ple  ideas  play  an  important  role  in  this  connec¬ 
tion:  the  steady-state  approximation  for  an  inter¬ 
mediate  and  the  quasi-equilibrium  approximation 
for  a  reaction. 

Consider  the  rth  species.  Its  variation  due  to 
reaction  can  be  written  in  the  form 

~  -  k;*  -  M  -  (4.1) 


where  w*  refers  to  the  positive  contribution  from 
the  various  reactions  (production)  and  w'  refers 
to  the  negative  contribution  (consumption).  The 
steady  state  approximation,  valid  if  c,  is  small 
compared  to  each  term  on  the  right,  is 

w+  =  nf  (4.2) 

If  we  just  examine  the  change  in  c ,  due  to  the  yth 
reaction,  then 

dc.  |  , 

~Si\~kt~k’>  (4'3) 

where  kf  (kr)  is  the  forward  (reverse)  reaction 
rate,  and  the  quasi-equilibrium  approximation  is 

(4.4) 

When  these  approximations  are  correctly  applied, 
substantial  simplification  is  possible  and  yet  rea¬ 
sonable  accuracy  is  maintained.  As  an  example, 
for  stoichiometric  methane/air  flames  a  four-step 
scheme  can  be  deduced  (28), 

CH,  +  Oj  COj  +  H2  +  HjO 

CO+HjO-COj+Hj 

Oj  +  2Hj  -ii*  2HjO  (4  5) 

Additional  approximations  are  sometimes  pos¬ 
sible  permitting  analytical  treatment  of  flame- 
structure.  Thus,  in  eq  (4.5),  «:  kv  so  that  we 

can  define  the  parameter 

(46) 


Tig,  17.  Example  of  a  structural  simplification  arising  from 
rate-ratio  asymptotics  After  a  figure  tn  ref.  28. 
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and  examine  the  limit  8  -> 0  (rate-ratio  asymp¬ 
totics).  In  this  limit  the  fuel-consumption  layer  is 
of  vanishing  thickness  and  its  structure  can  be 
analyzed  using  the  ansatz 

[ch4]0 ® o,  r«ro+0t$), 

C,  »  Cf<>  +  0(5)  (4.7) 

(see  Fig.  17)  Further  details  may  be  found  in  ref. 
28. 


S  NONLINEAR  HIGH-FREQUENCY  ACOUSTICS  AND 
COMBUSTION 


Auto-ignition  is  important  in  many  combustion 
problems.  In  Section  1  we  indicated  the  role  that  it 
can  play  in  one  type  of  deflagration-to-detonation 
transition,  and  it  is  central  to  engine  knock  in 
which  point  ignition  occurs  ahead  of  the  primary 
flame  front.  High-frequency  waves  (generated  by 
turbulence  or  inhomogeneties)  might  have  a  sig¬ 
nificant  impact  on  this  process,  and  recently  there 
have  been  some  interesting  extensions  of  nonlin¬ 
ear  high-frequency  acoustic  theory  to  the  problem 
of  propagation  through  reacting  gases  (29,301. 

A  periodic  sound  wave  propagating  to  the  right 
through  a  uniform  time-mdependent  medium 
(constant  background)  is  described  by 

+  /r", 0,0)  +  ...  (5.1) 

u  =‘T(p,v,s,v) 


(p  =*  density,  v  »  velocity,  S  «=  entropy,  Y  =>  mass 
fraction,  vT"  =  speed  of  sound.  (  )H  »  back¬ 
ground). 

If,  instead,  the  background  is  homogeneous  but 
nonconstant,  corresponding  to  a  homogeneous  ex¬ 
plosion,  so  that 


1 


drii 


Y(y-l)  dr 


m  rt-w*  (5.2) 


(cf.  cq.  (1.5),  an  early-time  approximation  valid 
when  reactant  depiction  can  be  neglected),  then 
we  adopt  the  ansatz  (30) 

u  -  u"  +  <o(x./,^ir(l,  /T"  ,0,0)  +  <2«J  + ... 

(5.3) 


for  small-amplitude  high-frequency  waves.  When 
substituted  into  the  governing  equations,  with  at¬ 
tention  restricted  to  a  single  right-moving  wave, 
solutions  o(x,  r,  6)  valid  as  e  ->  0  satisfy 


An  appropriate  solution  of  eq  (5.4)  is 


and  it  may  be  noted  that  m  the  limit  a  0,  Tu  -* 
0  (vanishing  amplitude,  constant  background)  the 
solution 


o  =  e‘*  (5.7) 

recovers  eq  (51)  with  w  =  <_1.  The  nonlinear 
term  in  eq  (5  5)  will  cause  dissipation  if  (and  only 
if)  shocks  form,  but  the  term  on  the  right  can  lead 
to  a  growth  in  amplitude 

Nonlinear  feedback  can  occur,  with  the  acous¬ 
tic  signal  affecting  the  mean  field  (background)  if 
the  activation  energy  is  large  and 

<  0,  A  -*  oo.  c A  fixed  (5.8) 

As  an  example,  during  the  induction  phase  of  an 
explosion  (when  the  ansatz  (1.6)  is  valid),  and  for 
a  left-moving  wave  (29). 

o,=5,(x.t)  +  a(0.x.l),  0=^-— 

T,  •=  (y  ~  1)(S,  +  o2  +  5}) 

2y(«i,~  °„)  -  y(y  - 1)«2,  =  2y(5, +  o,_) 

=  ef'{7^} 

9,-0,-  2{(y  +  l)o,  +  (y-  l)o2  +  (y  -  3)o,}<rff 

(5.9) 

Here,  all  the  perturbation  quantities  can  be  writ¬ 
ten  in  terms  of  o,.  a2  and  a y  (eg.  5 » to2);  T,  is 
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the  perturbed  mean  field  temperature,  nonzero 
because  of  the  nonlinear  feedback;  and  the  aver¬ 
age  is  taken  over  the  6  variable.  Using  these 
equations  it  can  be  shown  that  ignition  (i.e.  ther¬ 
mal  runaway  as  identified  with  the  result  (1.8)] 
will  occur  earlier  because  of  the  presence  of  the 
acoustic  wave. 
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Abstract 


Faeth  G  M ,  Kounalakis.  M  E  and  Sivathanu,  Y  R ,  1991  Stochastic  aspects  of  turbulent  combustion  processes  Chemometrus  and 
Intelligent  Laboratory  Systems.  10*  199-210 

Methods  of  using  stochastic  simulations  to  treat  nonlinear  interactions  in  turbulent  combustion  processes  are  described  — 
emphasizing  the  use  o«  statistical  time-senes  techniques  to  analyze  the  turbulence-radiation  interactions  of  nonpremixed  flames. 
Three  aspects  of  the  problem  are  considered,  as  follows  the  statistics  of  scalar  prop-  .es  in  turbulent  flames,  the  formulation  of 
ulgouthms  to  simulate  flame  radiation  based  on  flame  statistics,  and  evaluation  of  the  methodology  using  lecent  measurements  foi 
nonlaminou  flames  It  »s  shown  that  the  process  becomes  tractable  through  the  laminar  flamelel  approximation  whereby  all  scalai 
properties  are  taken  to  be  solely  functions  of  a  conserved  scalar  like  the  mixture  fraction  Thus,  the  simulations  aic  designed  to 
generate  realizations  of  mixture  fractions  along  radiation  paths  with  the  radiation  properties  of  each  realization  tound  using  a 
nairow-bond  radiation  model  An  autoregressive  process  that  reproduces  probability  density  functions  and  spatial  and  temporal 
cor.eiations  of  mixtuic  fractions  was  found  to  yield  reasonably  good  predictions  of  the  statistical  pioperties  of  spectral  ladiation 
intensities  measured  foi  turbulent  carbon  monoxide  and  hydrogen  jet  flames  burning  in  still  an  Although  the  approach  appears  to  be 
promising,  additional  development  is  needed  in  order  to  treat  some  of  the  unique  statistical  features  of  turbulence  that  are  not 
encountered  during  conventional  use  of  statistical  nme-senes  techniques 


INTRODUCTION 

Sluvhastic  simulations  arc  promising  for  treat¬ 
ing  a  variety  of  nonlinear  interactions  in  turbulent 
llows.  Recent  studies  along  these  lines  include  the 
,'urbulent  dispersion  of  particles  and  bubbles  (1—5), 
*he  motion  and  transport  of  drops  in  evaporating 
and  combusting  sprays  {6,7},  and  the  turbulence- 
radiation  interactions  of  nonpremixed  flames  [8- 
13).  The  objective  of  the  present  paper  is  to  de¬ 
scribe  the  application  of  this  methodology  to 
processes  encountered  in  turbulent  combusting 


flows.  In  order  to  control  the  scooe,  the  discussion 
will  focus  on  turbulence-radiation  interactions  of 
nonpremixed  (diffusion)  flames,  since  this  prob¬ 
lem  involves  the  most  significant  features  of  sto¬ 
chastic  simulations  of  turbulent  combustion 
processes. 

Initially,  methods  of  simulating  turbulent 
processes  were  relatively  ad  hoc  {1,2],  however, 
more  systematic  techniques  currently  are  being 
emphasized.  This  include*  full  stochastic  simula¬ 
tion  of  the  turbulent  field,  along  the  lines  of 
Kraichnan  {14],  to  study  the  turbulent  dispersion 
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of  particles  in  an  isotropic  turbulent  field  (15).  and 
adapting  statistical  time-series  techniques,  analo¬ 
gous  to  methods  described  by  Box  and  Jenkins 
(15),  for  problems  of  turbulent  dispersion  of  par¬ 
ticles  (3,5)  and  turbulence-radiation  interactions 
(13)  The  present  discussion  will  be  limited  to 
statistical  time-series  techniques  since  they  have 
modest  computational  requirements  and  provide 
reasonable  flexibility  for  treating  a  variety  of  prac¬ 
tical  turbulent  flows. 

The  mam  reason  for  interest  in  turbulence- 
radiation  interactions  is  that  radiation  levels  of 
turbulent  flames  are  generally  higher  (often  2-3 
times  higher)  than  estimates  based  on  mean  scalar 
properties  within  the  flames  [8— 12J  The  bias  of 
mean  radiation  levels  is  caused  by  nonlinear  rela¬ 
tionships  between  scalar  and  radiation  properties 
in  flames  This  precludes  averaging  scalar  proper¬ 
ties  first  and  then  computing  radiation  properties, 
instead,  the  radiation  properties  of  realizations  of 
the  scalar  field  must  be  found  first  and  then 
averaged  Properties  other  than  mean  radiation 
levels  arc  also  of  interest,  for  example,  fire  and 
flame  detectors  often  use  the  temporal  properties 
of  flame  radiation  fluctuations  to  distinguish 
flames  from  background  radiation.  Furthermore, 
maximum  (rather  than  average)  flame  radiation 
levels  provide  the  most  conservative  estimate  of 
flame  radiation  properties  for  fire  safety  consider¬ 
ations.  Finally,  studying  the  temporal  properties 
of  radiation  fluctuations  (moments,  probability 
density  functions,  and  temporal  power  spectral 
densities)  provides  information  to  better  under¬ 
stand  turbulence-radiation  interactions,  analo¬ 
gous  to  the  information  provided  by  he  temporal 
properties  of  velocity  and  concentration  fluctua¬ 
tions  to  better  understand  turbulent  mixing.  Thus, 
the  general  pioblem  of  turbulence-radiation  inter¬ 
actions  involves  both  the  mean  and  fluctuating 
radiation  properties  of  turbulent  flames  (11,12). 

Statistical  time-senes  simulations  of  the  radia¬ 
tion  properties  of  turbulent  flames  arc  based  on 
simulation  of  scalar  properties  within  the  flames. 
Therefore,  the  paper  begins  with  a  description  of 
the  statistics  of  scalar  properties  in  turbulent 
flames.  The  formulation  of  typical  stochastic 
simulations  is  then  considered.  The  paper  con¬ 
cludes  with  evaluation  of  the  methodology  using 


measurements  from  turbulent  hydrogen  and 
carbon  monoxide  jet  flames  burning  in  still  air 


SCALAR  PROPERTIES  OF  DIFFUSION  FLAMES 
Scalar  property  correlations 

Assuming  equal  exchange  coefficients  of  all 
species  and  heat,  negligible  effects  of  potential 
and  kinetic  energies  and  radiation,  and  reaction 
occurring  at  an  infmitely-thm  flame  sheet,  Burke 
and  Schumann  (16)  showed  that  scalar  properties 
in  laminar  nonpremixed  flames  were  functions 
(called  state  relationships)  of  any  one  of  a  number 
of  conserved  scalars.  Although  the  formal  require¬ 
ments  are  rather  restrictive,  state  relationships  have 
been  found  for  many  laminar  flame  systems  and 
are  widely  used  for  analysis  of  flame  structure  and 
radiation  properties  The  use  of  state  relationships 
has  also  been  extended  to  turbulent  nonpremixed 
flames,  since  they  generally  can  be  approximated 
as  wrinkled  laminar  flames  The  use  of  state  rela¬ 
tionships  for  turbulent  nonpremixed  flames  has 
come  to  be  called  the  conserved-scalar  formalism 
under  the  laminar  flamelet  approximation  (17,18). 

Typical  state  relationships  are  illustrated  in  Fig 
1.  This  involves  measurements  of  the  concentra¬ 
tions  of  major  gas  species  and  temperature,  T ,  for 
radial  traverses  at  various  heights,  x,  above  a 
burner  having  diameter,  d,  as  well  as  axial 
traverses,  within  laminar  nonpremixed  carbon 
monoxide/air  flames  having  various  burner  Rey¬ 
nolds  numbers,  Re.  In  this  case,  the  conserved 
scalar  is  the  local  fuel-equivalence  ratio  (the  mass 
fraction  of  fuel  elements  irrespective  of  species 
divided  by  the  stoichiometric  mass  fraction  of  fuel 
elements).  Predictions  based  on  the  assumption  of 
local  thermodynamic  equilibrium  for  an  adiabatic 
flame,  using  the  Gordon  and  McBride  (19)  al¬ 
gorithm,  are  also  shown  on  the  figure.  Aside  from 
temperature  (where  radiative  heat  losses  and  er¬ 
rors  of  uncorrected  temperature  measurements  arc 
a  factor)  the  measured  stale  relationships  arc  seen 
to  be  m  excellent  agreement  with  equilibrium  pre¬ 
dictions.  Thus,  the  tendency  of  reactive  systems  to 
approach  equilibrium  provides  a  physical  justifica- 
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FUEL  EQUIVALENCE  RATIO 

Tig  1  State  idationships  for  wrbon  monoxide/air  diffusion 
flames  from  Gore  et  al,  (8) 


tion  for  the  laminar  flamelet  approximation  in  (his 
instance. 

State  relationships  for  the  concentrations  of 
major  gas  species  and  temperature,  adequate  for 
estimates  of  structure  and  radiation  properties, 


have  been  found  from  measurements  in  laminar 
flames  for  a  variety  of  fuels  burning  in  air:  hydro* 
gen  {9,17],  methane  [18,20,21],  propane  [22],  n- 
heptane  [17.23],  acetylene  [11]  and  ethylene  [10] 
Hydrocarbons  exhibit  significant  departures  from 
local  thermodynamic  equilibrium  at  fuel-rich  con¬ 
ditions  due  to  effects  of  finite-rate  chemistry  asso¬ 
ciated  with  soot  processes;  however,  these  depar¬ 
tures  are  still  relatively  universal  so  that  adequate 
state  relationships  are  still  found  except  near 
points  of  flame  attachment.  Finally,  generalized 
state  relationships  have  been  found  for  hydro¬ 
carbon/air  flames  so  that  tedious  measurements 
to  find  state  relationships  for  specific  fuels  can  be 
avoided  [22], 

Application  of  the  conserved-scalar  formalism 
and  the  laminar  flamelet  approximation  to  find 
the  structure  of  turbulent  Fames  has  been  rea¬ 
sonably  successful  for  virtually  all  the  materials 
for  which  state  relationships  are  available  [8— 
13,17,24,25]  Recent  studies  also  suggest  that  state 
relationships  for  soot  volume  fractions,  an  im¬ 
portant  property  for  estimates  of  continuum  radi¬ 
ation  from  soot,  exist  in  turbulent  flames  having 
sufficiently  long  residence  times  [26,27].  This  im¬ 
plies  that  scalar  properties  needed  to  estimate 
radiation  are  strongly  correlated  through  their  state 
relationships  and  can  be  simulated  by  simulating  a 
conserved-scalar  alone 

Mixture  fraction  statistics 

Mixture  fraction,  /,  defined  as  the  fraction  of 
elemental  mass  that  onginated  from  the  fuel,  is 
the  conserved  scalar  most  commonly  used  to  find 
the  scalar  structure  of  turbulent  nonpremixed 
flames.  Turbulence  models  under  the  conserved- 
scalar  formalism  are  designed  to  provide  estimates 
of  the  mean  value  and  variance  of  mixture  frac¬ 
tions  [17,28].  Methods  used  to  estimate  the  other 
statistical  properties  needed  to  simulate  mixture 
fraction  distributions  along  radiation  paths  — ■ 
probability  density  functions  and  correlations  — 
will  be  considered  in  the  following. 

A  fuel  burning  in  air  involves  instantaneous 
properties  at  any  point  that  can  be  pure  air,  pure 
fuel  or  some  mixture  of  the  two  with  scalar  prop¬ 
erties  given  by  the  state  relationships.  Several 
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probability  density  functions  (PDFs)  of  mixture 
fraction,  />(/),  have  been  proposed  to  accommo¬ 
date  these  possibilities  but  the  chpped-Gaussian 
PDF  has  received  the  most  attention  [28).  This 
involves  a  Gaussian  function  defined  in  range 
0  </<  1  with  the  tails  of  the  distribution  replaced 
by  Dirac  delta  functions  at  f »  0  and  1  that  have 
weights  equal  to  the  probability  of  /<  0  and  />  1 
for  the  original  Gaussian  distribution,  respec¬ 
tively.  Thus,  the  air  intermittency  of  the  flame  at 
any  point,  defined  as  the  fraction  of  time  spent  in 
ambient  air,  is  given  by  the  weighted  Dirac  delta 
function  at  /  «  0. 

Recent  measurements  m  noncombusting  and 
combusting  turbulent  flows  suggest  that  the 
chpped-Gaussian  PDF  of  mixture  fraction  is  rea¬ 
sonable  (29,30).  Some  typical  results  are  illustrated 
in  Fig.  2  for  turbulent  carbon  monoxide  jet  flames 
burning  in  still  air.  The  measurements  in  the  fig¬ 
ure  are  fitted  with  chpped-Gaussian  PDFs  having 
the  same  mean  values  and  variances  Results  are 
shown  for  various  radial  positions,  r ,  before  and 
after  the  flame  tip  (x/d»  30  and  50).  The  air 
intermittency  spike  is  prominent  for  these  condi¬ 
tions  but  the  fuel  intermittency  spike  can  only  be 
seen  in  the  fitted  PDFs  near  the  axis  at  x/d  «  30. 
The  mam  deficiency  of  the  chpped-Gaussian  fits 
is  that  they  fail  to  represent  the  broadened  air 
intermittency  spike  caused  by  direct  mixing  be¬ 
tween  turbulent  fluid  and  air  near  the  edge  of  the 
flow  (the  air  superlayer).  A  PDF  having  additional 
moments  is  needed  to  correct  this  problem;  how¬ 
ever,  the  complication  of  finding  addition  mo¬ 
ments  has  not  been  pursued  pending  evaluation  of 
the  performance  of  the  two-moment  PLF.  Nota¬ 
bly,  the  functions  used  for  mixture  fraction  PDFs 
normally  do  not  have  a  strong  effect  upon  prcdict- 
irns  of  scalar  properties  in  turbulent  flames  (28). 

Correlations  of  mixture  fraction  fluctuations, 
/',  have  been  measured  for  turbulent  jet-like  flows 
for  both  noncombusting  (31,32)  and  combusting 
(30)  conditions.  Some  typical  spatial  correlations 
arc  illustrated  in  Fig.  3  for  a  carbon  monoxide  jet 
diffusion  flame  burning  in  still  air.  These  results 
involve  two-point  spatial  correlations  of  mixture 
fraction  fluctuations  for  horizontal  radial  paths 
through  the  flame  axis  at  positions  before,  near, 
and  after  the  flame  tip  (x/d  -  30,  40,  and  50). 


F«g  2  Typical  probability  density  functions  of  mixture  frac¬ 
tion  for  a  turbulent  carbon  monoxidc/air  diffusion  flame 
From  Kounalakis  and  facth  (30) 


The  correlations  are  plotted  as  a  function  of  A />  Tr. 
where  A r  is  the  distance  between  the  points  and 
rr  is  the  spatial  integral  scale  in  the  radial  direc¬ 
tion.  The  spatial  correlations  exhibit  remarkably 
little  variation  with  either  radial  or  axial  position 
when  plotted  in  this  manner.  A  simple  exponential 
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Fig  3  Spatial  correlations  of  mixture  fraction  fluctuations  for 
a  turbulent  carbon  monoxide/air  diffusion  flame  From 
KounalaVis  and  Faeth  (30) 


fit  of  (he  spatial  correlation: 

/'(r)/'(r  +  Ar)  /( /'2(r)/';(r  +  ir)),/! 

=  exp(-Ar/r,)  (1) 

is  also  shown  in  the  figure.  The  exponential  func¬ 
tion  is  seen  to  provide  a  reasonably  good  fit  of  the 
measurements,  as  illustrated  in  Fig.  3.  This  is 
partly  due  to  experimental  limitations,  since  the 
spatial  resolution  was  not  sufficient  to  resolve  the 
smallest  scales  of  the  flow  which  are  expected  to 
modify  the  correlation  near  Ar  <■  0  (30),  Neverthe¬ 
less,  the  exponential  expression  provides  a  good 
representation  of  the  larger  scales  that  contain 
most  of  the  signal  energy  and  are  expected  to  have 
the  greatest  influence  on  turbulence-radiation  in¬ 
teractions.  It  should  be  noted,  however,  that  these 
results  differ  from  earlier  findings  in  nearly  con¬ 
stant  density  jets  where  radial  correlations  of  mix¬ 
ture  fraction  fluctuations  had  the  shape  of  a 
Frcnkiel  function  [31,32)  —  these  differences  be¬ 
tween  combusting  and  noncombusting  conditions 
must  still  be  resolved. 


Temporal  correlations  of  mixture  fraction 
fluctuations  have  been  measured  for  the  turbulent 
carbon  monoxide  jet  diffusion  flames  as  well  [30] 
These  results  were  also  relatively  independent  of 
position  and  could  be  correlated  by  an  exponen¬ 
tial  function  analogous  to  eq.  (1)  with  time  dif¬ 
ferences,  A t,  normalized  by  the  integral  time  scale, 
t,  (subject  to  the  same  limitations  as  eq.  (1)  near 
A/*=0)  The  exponential  form  of  the  low-resolu¬ 
tion  temporal  correlation  measurements  agrees 
with  earlier  findings  for  noncombusting  flows  (32] 
With  exponential  functions  established  as  rea¬ 
sonable  approximations  of  spatial  and  temporal 
correlations  of  mixture  fraction  fluctuations,  the 
next  problem  is  specification  of  integral  scales 
Measurements  of  these  scales  for  turbulent  carbon 
monoxide  jet  diffusion  flames  are  illustrated  in 
Fig  4.  The  scales  are  normalized  as  Yt/x  and  as 
Tium/(x  "  *o)>  where  um  is  the  average  velocity  at 
the  burner  exit  and  x0  is  a  virtual  ongin  at  x0/d 
*  13  When  correlated  in  this  manner,  the  mea¬ 
surements  tend  to  collapse  to  single  curves  for  a 
range  of  flame  positions  The  spatial  integral  scales 
are  relatively  independent  of  radial  position  and 
can  be  correlated  as  rr/x  =  0017.  In  contrast,  t, 
is  smallest  at  the  axis.  This  behavior  can  be  ex¬ 
plained  through  Taylor’s  hypothesis,  e.g.,  t,  ~ 
rr/M,  where  u  is  the  local  time-averaged  stream- 


06 

04 

02 

oo 
002 

0  00 

000  004  OOS  012  016  020 


R* • 7400 

Symfetf 

x/d 

7 

30 

0 

40 

0 

50 

7  ' 


<7 

°°  0 


07  o  o  09 

J _ . _ L-J _ L 


r/x 


Fig.  4.  Temporal  and  spatial  integral  scales  in  a  turbulent 
carbon  monoxide/ air  diffusion  flame.  From  Kounalakis  and 
Faeth  (30}. 
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wise  velocity,  while  Fr  is  nearly  independent  of 
radial  position  and  u  is  a  maximum  at  the  axis. 

Results  concerning  mixture  fraction  statistics  in 
Figs.  2-4  were  generally  preserved  as  the  Rey¬ 
nolds  number  of  the  carbon  monoxide  flames  was 
increased  [30].  Nevertheless,  thi»  only  represents 
fragmentary  findings  for  a  single  reactant  combi¬ 
nation,  and  generalization  is  needed  to  treat  other 
flame  systems.  One  proposal  has  been  to  assume 
that  the  radial  spatial  integral  scale  is  proportional 
to  the  local  dissipation  length  scale  [12,13],  as 
follows: 

r,  >  (2) 

where  Cc  is  an  empirical  constant  having  a  value 
in  the  range  5-7,  is  a  turbulence  modeling 
constant  having  a  value  o l 0  09,  and  k  and  c  are 
mass- weighted  (Favre)  averaged  turbulence  kinetic 
energy  and  dissipation  found  from  structure  pre¬ 
dictions  using  a  turbulence  model  The  temporal 
integral  scale  was  then  estimated  using  Taylor’s 
hypothesis  while  assuming  that  strcamwisc  and 
radial  scales  were  the  same,  as  follows: 

T, »  I ',/«  (3) 

where  u  is  the  mass-weighted  (Favre)  averaged 
mean  velocity  in  the  strcamwisc  direction  Eqs.  (2) 
and  (3)  are  consistent  with  the  results  illustrated  in 
Fig.  4  but  additional  study  of  the  approximations 
is  certainly  needed  For  lack  of  an  alternative,  eqs 
(2)  and  (3)  will  be  used  to  find  integral  scales  in 
the  following 


STOCHASTIC  SIMULATION 
Formulation 

The  stochastic  simulation  provides  realizations 
of  mixture  fraction  distributions  along  radiation 
paths  through  the  flow.  Given  the  mixture  frac¬ 
tions,  the  state  relationships  provide  all  other 
scalar  properties  so  that  spectral  radiation  intensi¬ 
ties  can  be  calculated  from  a  narrow-band  radia¬ 
tion  model  for  each  distribution.  The  resulting 
ensemble,  or  time  scries,  of  spectral  radiation  in¬ 
tensities  is  then  used  to  compute  moments,  PDFs, 


correlations  and  power  spectra  of  spectral  radia¬ 
tion  intensities  in  the  usual  manner.  The  formula¬ 
tion  of  the  simulation  and  the  narrow-band  radia¬ 
tion  model  will  be  discussed  in  the  following 

It  will  be  assumed  that  the  statistical  properties 
of  mixture  fractions  are  known  along  the  radiation 
path.  This  includes  P(f )  (taken  to  be  a  chpped- 
Gaussian  function),  spatial  correlations,  and  tem¬ 
poral  correlations  if  temporal  properties  are 
needed  (both  taken  to  be  exponential  functions) 
Aside  from  isolated  cases  where  measurements  are 
available  [30],  these  properties  must  be  estimated 
from  a  model  of  the  turbulent  combustion  pro¬ 
cess.  For  flows  having  relatively  high  Reynolds 
numbers,  this  is  generally  done  using  a  turbulence 
model  Fortunately,  for  relatively  simple  flame 
geometries,  like  buoyant  jet  flames,  turbulence 
models  provide  reasonably  good  estimates  of  scalar 
properties,  including  mean  and  fluctuating  mix¬ 
ture  fractions  [8-13.24,25]  The  necessary  statisti¬ 
cal  properties  of  mixture  fractions  are  then  found 
as  described  earlier. 

Due  to  the  exponential  form  of  the  mixture 
fraction  correlations,  it  is  most  convenient  to  carry 
out  the  simulation  as  an  autoregressive  process 
(15)  This  involves  finding  the  mixture  fraction 
fluctuation  at  any  point  as  a  weighted  sum  of 
fluctuations  at  other  points  and  a  random  shock 
A  procedure  of  this  type  encounters  difficulties 
with  any  finite  range  PDF,  since  the  fluctuation 
algorithm  can  easily  generate  a  value  of  the  vari¬ 
able  which  is  beyond  the  range  of  the  PDF,  This  is 
handled  by  trans«orming  the  simulation  from  /, 
which  has  a  chpped-Gaussian  PDF,  to  a  corre¬ 
sponding  Gaussian  random  variable  z,  with  ap¬ 
propriate  moments  to  match  /*(/),  so  that 

/=:.0^:^1;  /  =  0,z<0;  /« l,r>l 

(4) 

Since  the  PDFs  of  /  and  z  are  not  the  same, 
correlations  of  /  and  z  differ  as  well.  Methods  to 
find  the  appropriate  correlations  for  z  will  be 
taken  up  later. 

Values  of  z  are  simulated  at  a  number  of  points 
along  the  radiation  path.  Following  Box  and 
Jenkins  [15).  the  value  of  the  fluctuation  of  z  at 
point  i.  z\%  is  found  as  a  weighted  sum  of  fiuctua- 
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lions  found  earlier,  z'.  where  j~i~\y...y  p%  and 
a  random  shock,  a,f  as  follows* 

/-i 

i&pn-i  (s) 

J-P 

The  index  p  is  selected  to  eliminate  points  having 
small  correlation  coefficients  with  respect  to  point 
/.  The  £  are  weighting  factors  so  that  the  Simula- 
tion  satisfies  correlations  between  fluctuations  at 
various  points  appearing  in  eq  (5).  The  parameter 
at  is  an  uncorrelated  Gaussian  random  variable 
having  a  mean  value  of  zero  and  a  vanance  selected 
so  that  the  simulation  satisfies  P(zt) 

Box  and  Jenkins  (15]  derive  expressions  for  the 
$  and  the  vanance  of  at,  af,  as  follows* 


i-i 


•*.'**'“  e  k-p... 

,1-1 

(«) 

)-p 

(7) 

j-p 


With  the  correlations  between  the  various  points 
known,  eqs  (6)  provide  t-p  linear  equations,  called 
the  Yule- Walker  equations,  needed  to  find  the 
This  system  of  equations  has  a  symmetnc 
positive  definite  matrix  and  can  be  solved  readily 
using  Cholesky  factorization  Given  the  <i>0,  af 
can  be  found  since  all  quantities  on  the  right-hand 
side  of  cq  (7)  are  known. 

A  time-independent  simulation  is  initiated  by 
making  a  random  sclcctior  for  point  1,  noting  that 
z[  =  rtj  from  eq.  (5)  and  af  *  zf2  from  cq.  (7)  The 
regression  relationships  are  then  successively  ap¬ 
plied  to  find  the  remaining  z\  along  the  radiation 
path  Finally,  the  ft  arc  found  from  cq  (4),  noting 
that  z,~zt  +  zf,  followed  by  computation  of  spec¬ 
tral  radiation  intensities  for  this  realization,  as 
described  earlier  This  process  is  repeated  a  suffi¬ 
cient  number  of  times  to  obtain  statistically  sig¬ 
nificant  radiation  properties. 

The  previously  computed  points  in  the  regres¬ 
sion  process  of  eq.  (5)  only  enter  the  calculations 
through  their  correlations,  therefore,  time-depen- 
dent  simulations  are  essentially  the  same  as  lime- 
independent  simulations  after  appropriately  num¬ 
bering  points  to  keep  track  of  them  in  space  and 
time.  This  involves  realizations  of  /  along  the 


radiation  path  at  times  A t  apart.  The  simulation  is 
initiated  by  finding  a  realization  using  the  time-in¬ 
dependent  solution.  Realizations  are  then  found  at 
subsequent  times  considering  correlations  with  all 
previous  realizations,  until  temporal  correlations 
are  properly  represented.  Subsequently,  the  points 
at  the  earliest  time  are  dropped  when  calculations 
for  the  next  time  are  begun,  for  computational 
efficiency. 

The  main  new  difficulty  with  the  time-depen¬ 
dent  simulation  is  that  two-point-two-time  corre¬ 
lations  are  needed  Information  of  this  type  is  not 
available;  therefore,  the  following  ad  hoc  ap¬ 
proximation  has  been  adopted  for  lack  of  an  alter¬ 
native  (13) 


zf  -k  A/) -/?,(*  A/)*,'*,'  (8) 

where  R,{k  A/)  is  the  temporal  correlation  coeffi¬ 
cient  of  z,  fluctuations  at  a  time  delay  of  k  A/. 
Naturally,  it  would  be  just  as  plausible  to  use 
Rj(k  bt)z',  z'j  on  the  right-hand  side  of  eq.  (8)  for 
a  stationary  turbulent  flow.  The  differences  be¬ 
tween  these  possibilities  provides  a  measure  of 
potential  errors  resulting  from  the  use  of  eq.  (8). 
Since  T,  is  nearly  constant  over  a  cross-section  of 
the  flow.  eq.  (3)  indicates  that  errors  are  greatest 
in  regions  where  u  varies  rapidly.  Fortunately, 
spatial  correlations  become  small  for  separation 
distances  of  I'r  and  u  docs  not  vaiy  significantly 
over  such  distarces,  pi  aiding  some  justification 
for  the  approximation. 

When  temporal  correlations  are  exponential, 
use  of  eq.  (8)  for  two-point- two-time  correlations 
leads  to  substantial  simplification  of  time-depen¬ 
dent  simulations.  Carrying  out  a  derivation  similar 
to  that  of  Box  and  Tcnkins  (15J  for  a  pure  time 
series  with  stationary  statistics  and  an  exponential 
temporal  correlation  yields  similar  results  for  the 
combined  spatial/  temporal  simulation  with  tem¬ 
poral  correlations  varying  according  to  eq.  (8), 
namely  the  ®  0  for  all  points  at  times  less  than 
/  -  A t.  Thus,  only  the  realization  at  /  -  A/  must 
be  retained  while  developing  the  realization  at  t , 
vastly  reducing  the  storage  and  computational  re¬ 
quirements  of  the  simulation. 

Another  useful  simplification  is  that  radiation 
predictions  are  relatively  insensitive  to  the  func- 
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tional  form  of  the  spatial  correlation,  since  they 
are  found  by  integrating  properties  along  a  radia¬ 
tion  path  (13,33)  Thus,  temporal  simulations  using 
statistically  independent  points  spaced  a  distance 
T,  apart  along  the  radiation  path  yielded  results 
that  were  essentially  the  same  as  simulations  that 
satisfied  twenty-point  fits  of  spatial  correlations 
along  the  radiation  paths  (13J.  This  simplification 
reduces  the  simulation  to  a  first*order  (Markov) 
process  in  time  at  each  point,  for  an  exponential 
temporal  correlation,  yielding  (15): 

+  (9) 

where 

^»(l-R,(ir)J)IT2  (10) 

Correlation  corrections 

Initial  time-senes  simulations  of  mixture  frac¬ 
tion  distributions  involved  the  approximation  that 
correlations  of  /  and  z  were  the  same  (12,13).  This 
was  adequate  in  most  regions  of  the  flames  but 
discrepancies  between  actual  and  simulated  corre¬ 
lations  of  mixture  fraction  fluctuations  were  sig¬ 
nificant  in  regions  where  either  air  or  fuel  mter- 
mittcncics  were  high  (13).  The  cause  of  the  diffi¬ 
culty  is  the  transformation  from  /  to  zs  since  z 
has  an  infinite  range  while  0  £/£  1.  This  implies 
that  the  correlations  of  the  fluctuations  of  z  must 
be  corrected  in  order  to  properly  simulate  the 
correlations  of  the  fluctuations  of  /. 

A  generalized  correction  of  the  z  correlations 
has  been  developed  for  any  two  points,  i  > j , 
having  identical  mean  and  fluctuating  mixture 
fractions.  /"//“/  and  Jp « Jp  =>  i.e.,  for 

temporal  correlations  at  stationary  conditions.  The 
simulation  is  carried  out  with  the  z  variable  where 

—  _  "ts  “jj  “72 

ztm  Zj**  z  and  z,  •az/  =  z  can  be  found  from 
the  transformation  of  eq.  (4).  In  order  for  the 
simulation  to  yield  the  correct  correlation.  // 
the  value  of  z*  z '  must  be  corrected  so  that  the 
following  equation  is  satisfied 

U7;+? 

"/  fU,)P(h)j  /(f/Mv  *i) 

'“00  •'-CO 

(11) 


where  f(z,)  and  /(ry)  arc  obtained  from  eq.  (4) 
and  p\z}.z,)  is  the  probability  density  function 
of  z}  given  zt.  Now,  the  correct  correlation  for  the 
z  variables  can  be  found  by  considering  an  auto¬ 
regressive  process  between  the  two  points  under 
the  present  approximations,  as  follows 

-  z/fzJSJ/?5)  +  a,  (12) 


where  aj  has  a  Gaussian  PDF  with 


Then,  for  any  realization  of  r/,  P(z} :  z,)  is  a 
Gaussian  distribution  having  a  mean  value  of  z  + 
ZfZ^Sj/z*  )  and  a  variance  of  a} ,  while  P(zt)  is  a 
Gaussian  distribution  having  a  mean  value  of  z 
and  a  variance  of  z'  .  Substituting  thc<e  expres¬ 
sions,  along  with  f(z,)  and  f(z})  from  cq  (4)  into 
eq.  (11)  yields  an  expression  relating  //  fj  and 
z\  zr  This  expression  must  be  evaluated  numeri¬ 
cally  for  a  clippcd-Gaussian  P(f)  The  procedure 
was  to  select  values  of  z%z  and  z '  z'  and  then 
find  the  corresponding  values  of  /,  /'  2,  and  //  //. 
Present  results  were  found  by  integrating  over  the 
region  within  $  standard  deviations  from  the  mean 
of  the  PDFs. 

Since  the  temporal  correlations  of  /  are  ex¬ 
ponential,  it  was  convenient  to  fit  the  correlations 
of  z  in  the  same  manner  and  to  express  the 
corrections  of  the  correlations  as  ratios  between 
the  temporal  integral  scales  of  /  and  z,  if/ic  This 
ratio  is  plotted  in  Fig.  5  as  a  function  of  (/  ),/2 
with  /  as  a  parameter.  The  results  arc  symmetric 
with  respect  to  /=  0.5.  The  plots  of  rf/rt  at  a 
particular  value  of  /  arc  terminated  at  the  maxi¬ 
mum  possible  value  of  (/'  )1/2,  i.e ,  where  P(f) 
degenerates  to  Dirac  delta  functions  at  /»  0  and 
1.  The  ratio  of  if/it  decreases  from  unity  as 
(/'  )l/2  increases  and  f  approaches  either  0  or  1. 
Thus,  there  is  no  correction  when  z  remains  in  the 
range  0-1  where  :=>/.  Whenever  z  <  0  or  z  >  1, 
however,  z2  >  f2  and  the  correlation  for  /  gener¬ 
ally  is  less  than  the  correlation  for  z  so  that  r{/7t 
is  less  than  unity. 

Simulations  using  corrected  correlations  for  s 
were  evaluated  for  0.1.  Using  104  reali- 
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Fig  5  Ratio  of  simulated  and  original  integral  scales  for 
exponential  correlations  of  functions  having  clipped-Guussian 
probability  density  functions. 


zaltons,  values  of  /  and  7**  were  satisfied  within 
1%  while  values  of  were  satisfied  within 

3%  Analogous  calculations  to  find  the  corre¬ 
sponding  corrections  of  the  correlations  when  / 
and  /'  are  not  the  same  at  the  two  points  are 
straightforward  on  a  case-by-case  basis. 

Narrow-band  radiation  model 

Given  the  distribution  of  scalar  properties  along 
a  radiation  path,  through  the  stochastic  simulation 
of  mixture  fractions  and  the  state  relationships, 
spectral  radiation  intensities  are  found  by  solving 
the  equation  of  radiative  transfer  along  the  path. 
Present  results  involved  using  a  narrow-band 
model,  ignoring  scattering,  due  to  Ludwig  et  al. 
(34).  The  procedure  uses  the  Goody  statistical 
narrow-band  model,  with  the  Curhss-Godson  ap¬ 
proximation  to  account  for  absorption  along  inho¬ 
mogeneous  gas  paths.  This  model  accounts  for  the 
infrared  gas  bands  of  water  vapor,  carbon  dioxide, 
carbon  monoxide,  and  methane,  as  well  as  con¬ 
tinuum  radiation  from  soot.  Radiation  contribu¬ 
tions  of  other  species  in  hydrogen,  carbon  mono¬ 
xide,  and  hydrocarbon  flames  burning  in  air  arc 
generally  negligible  since  these  species  have  small 


concentrations  in  regions  where  temperatures  are 
high 


RESULTS  AND  DISCUSSION 

Some  comparisons  between  simulated  and  mea¬ 
sured  radiation  properties  will  be  considered  in 
order  to  illustrate  the  nature  and  effectiveness  of 
the  simulations.  The  discussion  will  be  limited  to 
results  reported  by  Kounalakis  et  al.  (12,13)  for 
vertical  turbulent  hydrogen  and  carbon  monoxide 
jet  flames  burning  in  still  air.  Spectral  radiation 
intensities,  i  were  measured  for  horizontal  radia¬ 
tion  paths  th.  jugh  the  axis  of  the  flames  Predict¬ 
ions  were  based  on  the  present  formulation  of  the 
stochastic  simulation  of  mixture  fraction  distribu¬ 
tions.  As  noted  earlier,  twenty-point  fits  of  spatial 
correlations  in  the  simulation  yielded  essentially 
the  same  results  as  the  simplified  formulations  of 
eqs.  (9)  and  (10);  therefore,  the  following  results 
are  based  on  the  simplified  formulation.  Mixture 
fraction  statistics  were  estimated  based  on  struc¬ 
ture  predictions  using  a  turbulence  model  This 
introduces  uncertainties  although  the  turbulence 
model  yielded  reasonably  good  predictions  of 
scalar  structure  for  the  same  flames  during  earlier 
studies  (8,9). 

Predicted  and  measured  probability  de  ‘sity 
functions  of  /x  are  illustrated  in  Fig.  6  for  posi¬ 
tions  before,  near,  and  after  the  tip  of  a  hydrogen 
jet  flame  (x/</  =  50,  90,  and  130).  These  results 
are  for  a  wavelength  X  =  2520  nm  which  is  within 
a  prominent  infrared  gas  radiation  band  for  water 
vapor.  Near  the  burner,  the  PDFs  arc  relatively 
symmetric  but  they  become  increasingly  skewed  as 
distance  from  the  burner  exit  increases.  This  is  an 
effect  of  air  mtermittcncy  as  mean  radiation  levels 
become  small,  since  the  spectral  intensity  can  never 
be  negative  while  the  mean  value  is  generated  by 
occasional  period’,  of  high  radiation  levels.  The 
stochastic  predictions  represent  the  measurements 
reasonably  well,  particularly  for  the  small  path 
diameter  which  more  closely  approximates  the 
negligible  path  diameter  of  the  simulation 

Predicted  and  measured  temporal  power  spec¬ 
tra  of  spectral  radiation  intensities,  Ex(n)>  are 
illustrated  in  Fig.  7  for  positions  before,  near,  and 
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Fig  6  Measured  and  predicted  probability  density  functions 
of  spectral  radiation  intensities  for  a  turbulent  hydrogen/air 
diffusion  flame  From  KounalaVis  ct  a!  (12) 


after  the  tip  of  a  carbon  monoxide  jet  flame 
(x/d  *  35,  50,  and  65).  The  power  spectra  arc 
plotted  as  a  function  of  frequency,  n,  both  nor¬ 
malized  by  the  charactcnstic  frequency,  ujx, 
where  Tit  is  the  mean  velocity  at  the  flame  axis. 
The  spectra  exhibit  a  break  frequency  with  an 
energy-containing  region  having  a  nearly  constant 
EK(n)  at  low  frequencies,  followed  by  decay  of 
E\{n)  with  increasing  frequency  beyond  the  break 
frequency.  Normalized  break  frequencies  increase 
somewhat  with  increasing  distance  from  the 
burner.  This  follows  since  the  high  temperature 
region  that  contributes  most  to  radiant  emission  is 


located  off  axis  near  the  burner  and  moves  toward 
the  axis  with  increasing  distance  above  the  burner. 
Since  temporal  integral  scales  are  smallest  near  the 
axis  (see  Fig.  4),  this  implies  a  corresponding 
increase  in  the  break  frequency  when  normalized 
by  properties  at  the  axis. 

The  predictions  provide  reasonable  estimates  of 
break  frequencies  and  signal  properties  in  the 
energy-containing  region  for  the  results  illustrated 
in  Fig.  7.  The  main  deficiency  of  the  predictions  is 
that  they  underestimate  the  rate  of  decay  of  Ex(n) 
at  high  frequencies.  Two  main  reasons  can  be 
advanced  for  this  behavior.  First  of  all,  spectral 
intensities  were  measured  for  a  finite  diameter 
radiation  path.  This  tends  to  average  out  high- 
frequency  effects  over  the  cross-section  of  the 
radiation  path  in  comparison  to  predictions  which 
represent  an  infinitely  thin  path.  An  indication  of 
this  effect  can  be  seen  by  comparing  measure¬ 
ments  for  5-  and  10-mm-diameter  paths  appearing 
in  Fig.  7,  which  show  that  the  spectra  decay  more 
rapidly  for  the  larger-diameter  path.  Secondly,  the 


fig.  7,  Measured  and  predicted  temporal  power  spectral  densi¬ 
ties  of  spectral  radiation  intensities  for  a  turbulent  carbon 
monoxide/ air  diffusion  flame  From  Kounalakis  et  at.  (13) 
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exponential  correlation  function  used  in  the  sto¬ 
chastic  simulation  does  not  properly  truncate 
high-frequency  fluctuations  as  turbulent  micro¬ 
scales  are  approached,  as  noted  earlier.  This  causes 
the  predictions  to  overestimate  high  frequency  sig¬ 
nal  levels  Resolving  these  problems  would  require 
extension  of  the  stochastic  simulation,  to  allow 
simulation  of  groups  of  parallel  radiation  paths  so 
that  they  can  be  summed  over  a  finite-diameter 
path  and  to  accommodate  high-frequency  cut-offs 
associated  with  turbulence  microscales  when 
simulating  correlations  However,  since  spectral 
intensity  signal  energies  are  relatively  small  when 
the  discrepancy  becomes  significant,  such  exten¬ 
sions  are  not  needed  for  most  applications. 


CONCLUSIONS 

The  use  of  statistical  time-series  techniques  to 
treat  nonlinear  interactions  during  turbulent  com¬ 
bustion  processes  was  described.  Turbulence- 
radiation  interactions  were  used  to  illustrate  the 
method,  however,  other  turbulence  interaction 
problems  for  combusting  flows  require  a  similar 
treatment  of  scalar  properties.  Existing  evidence 
suggests  that  scalar  properties  arc  strongly  corre¬ 
lated  through  state  relationships  in  turbulent  dif¬ 
fusion  flames  and  can  be  simulated  by  only  simu¬ 
lating  a  conserved-scalar  like  mixture  fraction.  The 
statistics  of  mixture  fractions  in  turbulent  diffu¬ 
sion  flames  can  be  approximated  by  a  clipped* 
Gaussian  PDF  and  exponential  spatial  and  tem¬ 
poral  correlations,  at  least  for  the  large-scale  fea¬ 
tures  that  dominate  radiation  properties.  Stochas¬ 
tic  simulations  using  statistical  time-senes  tech¬ 
niques  must  be  modified  to  account  for  the  finite- 
range  PDF  of  mixture  fraction.  This  involved 
transformation  to  a  new  variable  having  a  Gaus¬ 
sian  PDF  and  finding  appropriate  corrections  for 
the  correlations  in  terms  of  the  new  variable.  An 
autoregressive  process  that  reproduced  the  PDFs 
and  spatial  and  temporal  correlations  of  mixture 
fractions  yielded  an  effective  simulation  to  find 
the  statistical  properties  of  spectral  radiation  in¬ 
tensities  from  turbulent  jet  diffusion  flames.  Thus, 
additional  development  and  application  of  the 
method  appears  to  be  warranted. 


ACKNOWLEDGEMENT 

This  research  was  supported  by  the  Center  for 
Fire  Research  of  the  National  Institute  of  Stan¬ 
dards  and  Technology  (formerly  the  National 
Bureau  of  Standards),  Grant  No.  60NANB8D0833 
with  H.R.  Baum  serving  as  Scientific  Officer. 


REFERENCES 

1  J  -S  Shucn,  ASP.  Solomon,  Q.-F  Zhang  and  G  M.  Faclh, 
Structure  of  particle-laden  jets,  measurements  and  predic¬ 
tions.  American  Institute  of  Aeronautics  and  Astronautics 
Journal.  23  (1985)  396-404 

2  T  *Y.  Sun  and  G  M  Faelh,  Structure  of  turbulent  bubbly 
jets.  International  Journal  of  Multiphase  /'/ok,  12  (1986) 
99-126 

3  A.  Picart,  A.  Berlemont  and  G  Gouesbet.  Modeling  and 
predicting  turbulence  fields  and  dispersion  of  discrete  par¬ 
ticles  transported  by  turbulent  flows.  International  Journal 
of  Multiphase  Flow,  12  (1986)  237-261 

4  M  R  Maxcy,  The  gravitational  settling  of  aerosol  particles 
in  homogeneous  turbulence  and  random  flow  fields.  Jour¬ 
nal  of  Fluid  Mechanics,  174  (1987)  441-465 

5  R  N.  Parthasarathy  and  G  M  Facth,  Turbulent  dispersion 
of  particles  in  self-generated  homogeneous  turbulence. 
Journal  of  Fluid  Mechanics,  n  press 

6  A  S  P  Solomon.  J  -S  Shuen,  Q  -F  Zhang  and  G  M  Faeth, 
Measurements  and  predictions  of  the  structure  of  evaporat¬ 
ing  sprays.  Journal  of  Heat  Transfer ,  107  (1985)  679-686 

7  J  S  Shucn,  ASP  Solomon  and  G  M  Facth.  Drop-turbu- 
Icnce  interactions  in  a  diffusion  flame,  Ameruan  Institute 
of  Aeronautics  and  Astronautics  Journal,  24  (1986)  101-  108 

8  J  P  Gore.  S-M  Jeng  and  G  M  Facth,  Spectral  and  total 
radiation  properties  of  turbulent  carbon  monoxidc/air  dif¬ 
fusion  flames,  American  Institute  of  Aeronautics  and  Astro¬ 
nautics  Journal.  25  (1987)  339-345 

9  J  P  Gore.  S-M  Jeng  and  G  M  Tacth,  Spectral  and  total 
radiation  properties  of  turbulent  hydrogen/air  diffusion 
flames.  Journal  of  Heat  Transfer.  109  (1987)  165-171 

10  J  P  Gore  and  G  M  Taeih,  Structure  and  spectral  radiation 
properties  of  turbulent  eihylcnc/air  diffusion  flames. 
Twenty-First  Symposium  ( International ')  on  Combustion,  The 
Combustion  Institute,  Pittsburgh.  PA.  1986.  pp  1521-1531 

11  JP  Gore  and  GM  Faeth,  Structure  and  radiation  proper¬ 
ties  of  luminous  turbulent  acetylene/air  diffusion  flames. 
Journal  of  Heat  Transfer,  110  (1988)  173-181 

12  M  E.  KounaiaVts,  J  P  Gore  and  G  M  Facth.  Turbulence/ 
radiation  interactions  in  nonpremixed  hydrogen/  air  flames. 
Twenty-Second  Symposium  (International)  on  Combustion, 
The  Combustion  Institute.  Pittsburgh,  PA,  1988.  pp  1281- 
1290. 

13  M  E.  Kounatakis,  J  P  Gore  and  G  M  Taeth.  Mean  and 
fluctuating  radiation  properties  of  turbulent  nonpremixed 


210 


Chemomcmcs  and  Intelligent  Laboratory  Systems  ■ 


carbon  monoxide/air  /lames.  Journal  of  Heat  Transfer,  111 

(1989)  1021-1030 

14  R  H  Kraichnan.  Diffusion  by  a  random  velocity  field. 
Physics  of  Fluids,  13  (1970)  22-31. 

15  G  E  P.  Box  and  G  M,  Jenkins.  Time  Senes  Analysis,  Holden 
Day.  San  Francisco,  CA,  revised  edition,  1976,  pp.  47-84 

16  S  P  Burke  and  T.E  W.  Schumann.  Diffusion  flames.  In¬ 
dustrial  and  Engineering  Chemistry,  20  (1928)  998-1004. 

17  R  W  Bilgcr,  Turbulent  jet  diffusion  flames.  Progress  in 
Energy  and  Combustion  Science,  1  (1976)  87-109. 

18  R  W.  Bilger,  Reaction  rates  in  diffusion  flames.  Combustion 
and  Flame ,  30  (1977)  277-284 

19  S  Gordon  and  BJ.  McBnde,  Computer  Program  for  Calcu¬ 
lation  of  Complex  Chemical  Equilibrium  Compositions, 
Rocket  Performance,  Incident  and  Reflected  Shocks,  and 
Chapman- Jouguet  Detonations .  NASA  SP-273,  Washing¬ 
ton.  DC  1971 

20  K  C  Smyth,  J  II  M.llcr,  R  C  Dorfman,  W  G  Mallard  and 
R.J  Santoro,  Soot  inception  in  a  methane/air  diffusion 
flame  as  characterized  by  detailed  species  profiles.  Combus¬ 
tion  and  Flame,  62  (1985)  157-181. 

21  K.  Saito,  FA  Williams  and  AS  Gordon.  Structure  of 
laminar  coflow  methane-air  diffusion  flames  Journal  of 
Heat  Transfer ,  108  (19S6)  640-648 

22  Y  R  Swathanu  and  G  M  Faeth,  Generalized  state  relation¬ 
ships  for  scalar  properties  in  nonpremixed  hydrocarbon/air 
flames.  Combustion  and  Flame,  in  press. 

23  J II  Kent  and  FA  Williams.  Extinction  of  laminar  diffu¬ 
sion  flames  for  liquid  fuels.  Fifteenth  Symposium  (Interna¬ 
tional)  on  Combustion,  The  Combustion  Institute,  Pitts¬ 
burgh,  PA.  1974,  pp  315-325 

24  S-M  Jcng  and  GM  raeth.  Species  concentr  itions  and 
turbulence  properties  in  buoyant  methane  diffusion  flames. 
Journal  of  Heat  Transfer.  106  (19S4)  721-727 


25  Y.R.  Swathanu,  J  P.  Gore  and  G  M.  Faeth,  Scalar  proper¬ 
ties  in  the  overfire  region  of  sooting  turbulent  diffusion 
flames.  Combustion  and  Flame.  73  (1988)  315-329. 

26  Y.R.  Swathanu  and  G  M.  Faeth,  Soot  volume  fractions  in 
the  overfire  region  of  turbulent  diffusion  flames.  Combus¬ 
tion  and  Flams,  81  (1990)  133-149. 

27  Y  R.  Swathanu  and  G.M.  Faeth,  Temperature/soot  volume 
fraction  correlations  in  the  fuel  nch  region  of  buoyant 
turbulent  diffusion  flames.  Combustion  and  Flame,  81  (1990) 
150-165. 

28  F.C.  Lockwood  and  A  S  Naguib,  The  prediction  of  the 
fluctuations  in  the  properties  of  free,  round-jet,  turbulent 
diffusion  flames.  Combustion  and  Flame,  24  (1975)  109-124. 

29  M  -C.  Lai  and  G  M  Faeth,  Turbulence  structure  of  vertical 
adiabatic  wall  plumes.  Journal  of  Heat  Transfer,  109  (1987) 
663-670. 

30  M  E.  Kounalakis  and  G.M  Faeth.  Measurement  of  mix¬ 
ture-fraction  correlations  in  turbulent  jet  diffusion  flames. 
Proceedings  of  Fa'l  Technical  Meeting,  Eastern  Secret  of 
the  Combustion  Institute,  Pittsburgh.  PA,  1989 

31  S  Corrsin  and  MS.  Uberoi,  Spectra  and  Diffusion  in  a 
Round  Turbulent  Jet,  NACA  Report  No  1040,  Washington, 
DC,  1951 

32  If  A.  Becker,  H  C  Hot  tel  and  G  C  Williams,  The  nozzle- 
fluid  concentration  field  of  the  round,  turbulent,  free  jet. 
Journal  of  Fluid  Mechanics.  30  (1967)  285-303 

33  W.L.  Grosshandler  and  P.  Joulain,  The  effect  of  large-scale 
fluctuations  on  flame  radiation.  Progress  in  Astronautics 
and  Aeronautics,  105(H)  (1986)  123-152 

34  C  B  Ludwig,  W  Malkmus,  J.E.  Reardon  and  J  A  Thom¬ 
son.  Handbook  of  Infrared  Radiation  from  Combustion 
Gases.  NASA  SP-3080.  Washington.  DC.  1973 


■  Discussion 


21! 


Chemometncs  and.  Intelligent  Laboratory  Systems,  10  (1991)  211*212 
fclSevicr  Science  Publishers  B  V ,  Amsterdam 


Nonequilibrium  chemistry  and  flamelet 
modeling  of  nonpremixed  turbulent 
reacting  flows 


Mitchell  D.  Smooke 

Department  of  Mechanical  Engineering,  Yale  University,  New  Haven,  CT  06520  (U  S  A  ) 


Practical  combustion  systems  often  involve  the 
burning  of  nonpremixed  fuel/ air  systems  in  a 
turbulent  flow  environment.  While  the  ultimate 
modeling  of  such  nonpremixed  systems  will  inevi¬ 
tably  involve  the  direct  solution  of  the  three-di¬ 
mensional  time-dependent  conservation  equations 
of  mass,  momentum,  species  balance  and  energy, 
such  a  task  is  computationally  infeasible  on  even 
the  largest  supercomputers  at  the  current  time. 
The  primary  difficulty  with  such  an  approach  is 
that  there  are  large  variation*  (orders  of  magni¬ 
tude)  in  the  length  scales  present  in  the  reacting 
flow.  The  ability  to  resolve  the  relevant  solution 
structure  requires  computational  resources  that 
currently  do  not  exist.  As  a  result,  the  modeling  of 
nonpremixed  turbulent  reacting  flows  requires  the 
introduction  of  a  number  of  simplifying  assump¬ 
tions  to  make  the  problem  more  tractable.  One  of 
these  methods,  the  laminar  flamelet  model,  con¬ 
siders  a  turbulent  flame  to  be  composed  of  an 
ensemble  of  thin  laminar  diffusion  flames.  It  can 
be  shown  that  these  flamclcts  have  a  one-dimcn- 
sional  structure  normal  to  the  surface  of  the 
stoichiometric  mixture  (1).  The  model  is  applicable 
if  the  length  scales  of  the  turbulent  eddies  ?re 
much  larger  than  the  reaction  2onc  thickness  of 
the  flamelets.  The  structure  of  these  flames  are 
often  described  in  terms  of  a  conserved  scalar  Z 
called  the  mixture  fraction.  The  mixture  fraction 
can  be  considered  to  be  the  fuel  element  mass 
fraction  in  the  system.  Variations  of  the  laminar 
flamelet  approach  center  primarily  in  terms  of  the 


chemical  approximations  used  in  describing  the 
flamelets.  In  some  situations  local  thermodynamic 
equilibrium  chemistry  is  appropriate  In  other 
cases  finite  rate  chemistry  is  needed.  In  the  discus¬ 
sion  that  follows  we  consider  the  incorporation  of 
finite  rate  chemistry  into  flamelet  models  of  non¬ 
premixed  turbulent  combustion. 

Due  to  the  spatial  variation  in  the  stretching  of 
the  turbulent  flame,  the  flamelets  are  subjected 
instantaneously  to  a  certain  rate  of  strain.  This 
can  be  represented  in  terms  of  the  scalar  dissipa¬ 
tion  x«  »t  the  point  of  stoichiometry  (2) 

x«-2«(£USL  (,) 

where  a  is  the  strain  rale,  C  is  the  Chapman- 
Rubcsin  parameter,  Pr  is  the  Prandtl  number  and 
ij  is  a  density  weighted  coordinate  The  implica¬ 
tions  of  this  model  are  that  at  any  point  of  space 
the  instantaneous  local  composition  of  the  turbu¬ 
lent  flame  is  that  of  the  diffusion  flamelet.  Local 
conditions  may  be  viewed  as  corresponding  to  a 
flamelet  in  a  flamelet  family  that  is  parameterized 
by  the  degree  of  stretching  x«-  The  structure  of 
the  flamelet  provides  a  unique  relationship  be¬ 
tween  any  scalar  S  and  Z.  We  write  this  in  the 
form 

S~S(Z.X»)  (2) 

We  treat  Z  and  xt,  as  random  variables  whose 
joint  probability  density  function  (PDF)  P(Z,xu) 
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must  be  determined.  In  practice  the  PDF  is  fac¬ 
tored  such  that 

Hz.x*)-Hz)Hx»)  (3) 

Ordinarily,  P(Z)  is  taken  to  be  the  beta  function 
and  P(Xu )  taken  to  be  the  log  normal  distribu¬ 
tion  [31.  The  mean  and  variance  of  the  log  normal 
distribution  are  computed  from  the  first  moment 
of  x«  and  the  Favre  averaged  turbulent  dissipa¬ 
tion  and  turbulent  kinetic  energy,  respectively. 
With  this  formalism  established,  scalar  properties 
are  determined  by  postulating  a  set  of  burned  and 
unburned  states  (2).  In  particular,  we  can  write  the 
Favre  averaged  value  of  the  burned  contribution 
to  the  scalar  S  as 

s=  (X'"fs(Z,Xi,)P(Z)P(xu)<iZ<ixa  (4) 

■'0  Jo 

where  x««  represents  the  maximum  value  of  the 
scalar  dissipation  at  which  a  flame  exists  A  simi¬ 
lar  integral  can  be  written  for  the  unburnt  states 
The  joint  dependence  of  S  on  Z  and  xsl  is  para¬ 
metric  in  Xu  an(f  ,s  characterized  by  a  limited 
number  of  data  files  that  constitute  a  flame  library 
[4]  Evaluation  of  the  properties  in  (4)  arc  carried 
out  by  replacing  the  integral},  by  numerical 
quadratures.  The  Favre  averaged  properties  are 
then  utilized  in  a  boundary  layer  k-t  turbulent 
flow  model. 

The  individual  laminar  diffusion  flamclets  in 
the  flamelct  library  arc  modeled  by  considering 
countcrflowmg  streams  of  fuel  and  oxidizer  in 


either  a  Tsuji  or  a  Seshadri-type  burner  [5,6].  A 
similarity  solution  is  sought  for  the  two-dimen¬ 
sional  governing  conservation  equations  The 
flamelet  problem  is  then  reduced  to  solving  a 
nonlinear  two-point  boundary  value  problem  along 
the  stagnation  point  streamline.  Individual 
flamelet  calculations  can  be  made  for  a  given 
chemical  mechanism,  transport  approximation  and 
jet  velocities.  Once  the  computation  is  completed, 
the  solution  can  be  stored  as  a  function  of  the 
mixture  fraction  with  each  flamelet  characterized 
by  the  scalar  dissipation  at  the  point  of  stoichio¬ 
metric  mixture. 
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Abstract 


RandiC,  M ,  1991  Novel  graph  theoretical  approach  to  heteroatoms  in  quantitative  structure-activity  idationships  C hemomeirus  and 
Intelligent  Laboratory  Systems,  10  213-227. 

A  novel  approach  tu  characteuzation  of  heteroatoms  in  graph  theoretical  approaches  to  quantitative  structure-activity  relation* 
ships  (QSAR)  ts  outlined.  The  basis  of  the  approach  is  the  use  of  diagonal  entries  of  the  adjacency  matrix  as  variable  parameter,  in 
full  analogy  to  the  well  known  generalization  of  the  Huckei  Moleculai  Oibitals  fllMOj  method  when  extended  io  hettroconjugatcd 
systems  The  approach  is  illustrated  on  clonidine-like  compounds  where  carbon  and  chlorine  atoms  <.  re  discriminated  by  using 
a  -  0  20  as  the  diagonal  entry  foi  chlorine  atoms.  Derived  weighted  path  numbers  are  used  as  descriptors  and  a  multiple  regression 

based  on  three  descriptors  resulted  m  the  correlation  coefficient  R*0  977  and  the  standard  error  5^0  233  This  represents  a 
substantial  improvement  over  the  best  liaditional  QSAR  analysis  which  involves  five  descriptor  (in  a  nonhncai  coirelation  equation 
with  /?*0  964  and  5*0  301)  A  detailed  comparison  is  made  with  available  QSAR  results,  and  the  advantages  (as  well  as 
limitations)  of  graph  theoretical  descriptors  are  discussed 


INTRODUCTION 

In  contrasting  graph  theoretical  schemes  (1J  to 
traditional  quantitative  structure-activity  relation¬ 
ship  (QSAR)  methods  (2)  one  cannot  fail  to  ob¬ 
serve  the  complementarity  of  the  two  approaches. 
Traditional  QSAR  is  mostly  based  on  a  large 
number  of  empirical  parameters.  The  graph  theo¬ 
retical  approaches  use  a  rather  small  set  of  struct¬ 
ural  invariants,  graph  invariants  in  particular.  In 
traditional  QSAR  one  uses  statistical  methods  in 
order  to  select  critical  descriptors  and  derive  a 
structure-activity  correlation.  In  graph  theory  one 


*  This  paper  is  dedicated  to  Professor  Dusan  ftatti  from  Boris 
Kidn£  Institute  in  Ljubljana.  Slovenia.  Yugoslavia 


manipulates  structures  algebraically,  using  partial 
order  and  ranking  based  on  selected  standards.  Of 
course,  graph  theoretical  descriptors  also  lead  to 
structure-property  and  structure-activity  correla¬ 
tions  based  on  statistical  analysis  (3-6J.  The  appli¬ 
cations  of  graph  theory  {7j  to  QSAR  cover  a 
variety  of  topics,  from  the  study  of  various  physi¬ 
cochemical  data  to  biological  activities  and  toxici- 
ties  (refs.  1, 2  and  5-7,  and  references  cited  therein, 
and  refs.  8-24),  including  even  the  use  of  graph 
theoretical  descriptors  tn  pattern  recognition  [25]. 
But  the  prime  distinction  between  graph  theoreti¬ 
cal  schemes  and  traditional  QSAR  is  that  the 
former  is  ‘structure-explicit’  while  the  latter  is 
‘structure-cryptic’  (1).  The  former  uses  well  de¬ 
fined  mathematical  invariants  which  have  «  direct 
structural  interpretation,  while  the  latter  arc  mostly 
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expressed  as  physicochemical  properties  that  re¬ 
main  to  be  interpreted  structurally.  For  example, 
the  molar  refraction  (MR)  has  frequently  been 
used  as  a  descriptor  in  traditional  QSAR,  but  how 
MR  depends  on  molecular  structure,  so  that  it  can 
be  predicted  once  the  chemical  structure  is  known, 
still  remains  to  be  understood.  The  distinction  can 
be  illustrated  by  reference  to  a  particular  study  of 
selected  physicochemical  properties  of  over  a 
hundred  compounds  by  Cramer  (26).  Using  prin¬ 
cipal  component  analysis  Cramer  has  shown  that 
aqueous  solvation  or  the  activity  coefficient  m 
water,  partition  coefficient  (octanol/ water),  boil¬ 
ing  points,  molar  refraction,  liquid  state  molar 
volumes  and  heats  of  vaporization,  which  mutu¬ 
ally  show  variable  pairwise  correlations,  from  non¬ 
existent  to  very  high  correlations,  can  all  be  well 
explained  (at  95%  variance)  bv  two  variables  This 
illustrates  well  the  presence  of  structural  factor,  as 
yet  to  be  identified,  on  which  all  the  studied 
properties  critically  depend.  According  to  Cramer 
(22)  “ .  it  seems  possible  that  n  olecular  connec¬ 
tivity  indices  may  represent  alternative  axes  for 
compound  subsets  withm  ‘BC(DEF)  space*.”  Un¬ 
certainty  here  reflects  upon  the  intrinsic  difficulty 
associated  with  attempts  to  express  mathematical 
properties  (graph  invariants)  as  a  combination  of 
physicochemical  variables,  instead  of  the  other 
way  round.  If  Cramer  is  correct  in  identifying  the 
connectivity  indices  as  alternative  axes  of  physi¬ 
cochemical  space,  the  two  major  variables  being 
identified  as  ‘bulkincss’  and  cohesiveness’,  that 
would  only  idicate  that  ‘bulkincss’  and  'cohesivc- 
ness*  as  molecular  properties,  will  cordate  with 
the  connectivity  indices. 


LIMITATIONS  OF  GRAPH  THEORETICAL  AP¬ 
PROACHES 

Graphs  depict  molecular  connectivity  and  as 
such  are  devoid  of  information  on  heteroatoms 
and  the  spatial  arrangements  of  atoms.  It  is  not 
then  surprising  that  to  uninitiated  people  graph 
theoretical  schemes  appear  at  best  unpromising,  if 
not  doomed  to  failure.  Equally,  graph  theory  does 
not  produce  numerical  data,  analogous,  say,  to 
quantum  mechanical  computations.  It  can  never¬ 


theless  lead  to  quantitative  results  when  informa¬ 
tion  on  selected  standards  is  available.  As  long  as 
the  molecules  considered  are  structurally  closely 
related  (e.g.  they  have  the  same  heteroatoms  in 
similar  locations  and  have  the  same  stereochem¬ 
istry)  graphs  can  be  employed  and  useful  correla¬ 
tions  derived  (28-36).  A  neglect  of  heteroatoms 
and  spatial  molecular  architecture  may  appear  to 
be  severe  limitations  of  graph  theoretical  models. 
However,  for  QSAR  studies  concerned  with  a 
search  for  optimal  compounds,  once  lead  com¬ 
pounds  arc  known,  graph  theoretical  schemes  were 
found  to  be  quite  successful,  not  only  in  suggest¬ 
ing  a  more  potent  compound,  but  m  providing 
assurance  that  the  compound  thus  found  is  the 
best  possible  one  within  the  given  family  [37J. 

An  extension  of  molecular  graphs  to  molecular 
structures  by  embedding  graphs  on  a  regular 
three-dimensional  grid  has  only  recently  been  con¬ 
sidered  (38-40)  By  using  topographic  (geometri¬ 
cal)  matrices,  rather  than  topological  (graph  theo¬ 
retical)  adjacency  matrices,  one  can  differentiate 
between  different  conformcrs,  such  as  as  and 
trans ,  boat  and  chair ,  between  individual  rota¬ 
tional  isomers,  etc.  Importantly,  the  derived 
molecular  descriptors  arc  quite  analogous  to 
molecular  connectivity  indices,  weighted  path 
numbers  and  other  graph-related  invariants,  ex¬ 
cept  that  now  they  are  sensitive  to  precise  molecu¬ 
lar  geometry.  Moreover,  the  indicated  generaliza¬ 
tion  from  adjacency  (connectivity)  to  topography 
(geometry)  suggests  further  generalizations  of 
graphs  m  which  structural  invariants  arc  derived 
from  other  matrices  associated  with  molecules, 
such  as  the  bond  order  matrix,  the  bond  polariza¬ 
bility  matrix  and  even  the  Hamiltonian  matrix 
(41).  It  appears  that  we  are  only  at  the  beginning 
of  new  directions  in  our  search  for  useful  molecu¬ 
lar  descriptors.  However,  here  we  will  restrict  our 
attention  to  another  generalization  of  graphs:  to 
the  problem  of  treating  heteroatoms. 

Applications  of  graph  theoretical  methods  in 
QSAR  to  molecules  with  heteroatoms  in  more 
general  positions  lead  to  a  number  of  generaliza¬ 
tion  of  simple  graphs.  Kicr  and  Hall  (3)  intro¬ 
duced  the  concept  of  valence  connectivities  in 
which  they  associated  different  ‘correction’  factors 
with  different  heteroatoms.  Kupchik  (42)  consid- 
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ered  the  use  of  Van  der  Waals  radii  of  hetero¬ 
atoms  as  a  source  of  their  discrimination  by  suita¬ 
bly  modifying  the  connectivity  indices.  Hansen 
(43J  similarly  considered  purely  empirical  correc¬ 
tion  factors  for  heteroatoms.  In  this  paper  we  will 
outline  yet  another  alternative  approach  to  hetero¬ 
atom  which  has  some  analogy  with  generalizations 
of  Hiickel  Molecular  Orbital  (HMO)  methods  from 
hydrocarbons  to  hcteroconjugated  compounds  (44] 
and  represents  an  extension  of  an  earlier  work  on 
sensitivity  of  path  numbers  to  variation  in  bonds 
involving  oxygen  and  nitrogen  (45]. 

CLONIDINE-LIKE  IMIDAZOL1D1NES  —  AN  ILLUS- 
TRATION  OF  A  GRAPH  THEORETICAL  APPROACH  TO 
HETEROATOMS 

Wc  have  selected  clonidinc  and  clonidine-like 
imidazolidmcs  —  compounds  having  a  hypoten¬ 
sive  action  --  because  in  these  molecules  chlorine 
(as  heteroatom)  appears  in  different  locations  and 
therefore  the  compounds  offer  a  suitable  test  if  the 
suggested  novel  descriptor  for  a  heteroatom  is 
adequate.  The  elonidine-hke  compounds  ex¬ 
amined  here  have  been  extensively  studies  in  the 
past  (46-48],  including  a  particularly  detailed 
study  by  Timmermans  and  Van  Zwictcn  (49)  based 
on  the  traditional  QSAR.  Thus  it  will  be  possible 
to  make  a  detailed  comparison  between  the  corre¬ 
lations  based  on  molecular  properties  as  descrip¬ 
tors  and  our  results  derived  from  the  use  of  graph 
theoretical  indices  as  descriptors.  Moreover,  the 
data  set  used  includes  two  extreme  potency  values 
which  would  be  expected  to  give  trouble  in  curve 
fittings  and  cross-validation,  and  hence  the  data 
enables  a  critical  test  of  a  modelling  of  biological 
activity  to  be  made. 

A  need  for  a  novel  appraoch  to  hctcroatoms  in 
graph  theoretical  approaches  becomes  apparent 
from  a  comparison  of  the  biological  activity  of  the 
following: 

Compound  Activity 

2.4- Dimethyl  810 

2-Mcthyl,4-ch!oro  275 

2.4- Dichloro  61 

2-Chloro,  4-methyl  53 


Hg.  1  Numbering  of  ihc  compounds  and  diagrams  of  the 
Na’nablc  fragment  of  2-<arylmuno)imidazolidmes  considered 
Chtonne  atoms  are  indicated  as  small  circles 


The  four  compounds  selected  illustrate  a  lack  of  a 
bond  additivity  for  the  biological  activity  (experi¬ 
mental  EDjo  values  in  pg/kg  obtained  from 
dose-response  curves  following  intravenous  ad¬ 
ministration  to  anesthetized,  normotensive  rats, 
i.e.,  in-vivo  effective  dose  which  produces  in  50% 
of  population  anesthesia).  Any  bond  additive 
scheme  should  interpolate  data  on  derivatives  with 
a  single  methyl  and  single  chlonne  between  the 
dimethyl  and  the  dichloro  derivative,  but  this  ap¬ 
parently  is  not  possible  here. 


TRADITIONAL  QSAkt  CORRELATIONS  BETWEEN 
PROPERTIES  AND  ACTIVITY 

In  Fig.  I  we  depicted  molecular  skeletons  of  the 
18  imidazolidines  from  a  collection  of  27  reported 
in  the  study  of  Timmermans  and  Van  Zwicten 
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[49}.  We  have  restricted  our  attention  only  to 
clomdine  derivatives  with  chlonne  as  heteroatom 
The  nine  compounds  not  considered  here  involve 
bromine,  fluorine,  nitrogen  and  oxygen  and  offer 
too  small  a  sample  to  allow  one  to  determine 
empirically  the  graph  theoretical  parameters  that 
discriminate  between  these  heteroatoms.  The 
QSAR  parameters  considered  by  Timmermans  and 
Van  Zwieten  include: 

(a)  d  pA'a,  which  refers  the  substituent  effect  on 
the  dissociation  of  the  lmidazohdmes  in  water 
expected  to  prevail  under  psychological  condi¬ 
tions; 

(b)  ^-electron  charge  densities,  from  quantum 
chemical  calculations  derived  for  free  bases 
and  protonated  species; 

(c)  the  energies  of  the  highest  occupied  molecu¬ 
lar  orbital  (HOMO)  and  lowest  empty  (un¬ 
occupied)  molecular  orbitals  (LEMO  or 
LUMO),  in  particular  those  of  protonated 
species  were  considered  as  molecular  descrip¬ 
tors; 

(d)  the  lowest  electronic  excitation  energies  of  the 
molecules  (given  by  the  difference  of  HOMO 
and  LUMO  energies); 

(e)  log  P'  (apparent  partition  coefficient)  from 
the  octanol-0.1  M  phosphate  buffer.  pH  7.4, 
system; 

(0  the  hydrophobic  constant  it  (in  fact  the  sum¬ 
mation  over  the  substitutent  w  values)  adopted 
as  a  measure  of  hydrophobic  interactions, 

(g)  parachor,  defined  by  Sudgcn  [50]  as  the  prod¬ 
uct  of  the  molecular  volume  and  the  fourth 
root  of  the  surface  tension,  a  measure  of 
molecular  size  (along  the  scries  where  surface 
tension  is  constant)  perhaps  related  (via 
surface  tension)  to  an  overall  lipophilic  behav¬ 
ior  of  the  molecules; 

(h)  Taft’s  stenc  constant  (51),  as  expanded  by 
Hansch  (52),  to  account  for  the  stcric  proper¬ 
ties; 

(1)  the  molar  refractions  at  the  wavelength  of  the 
sodium  D  doublet  line.  MR  as  a  representa¬ 
tion  of  the  molecular  volume. 

Observe  the  rather  lengthy  list  of  molecular 
properties,  experimental  or  computed,  used  in  the 
search  for  the  structure- activity  correlations.  This 
hind  of  QSAR  should  be  termed  property- activ¬ 


ity,  because  the  analysis  is  confined  mostly  to 
property-activity  relationships.  Let  us  point  out 
the  difficulties  in  counting  the  parameters  used  in 
such  analyses.  A  lack  of  information  on  the  de¬ 
grees  of  freedom  (i.e  the  number  of  independent 
parameters)  involved  leads  to  ambiguities  about 
the  reported  statistics.  Part  of  the  problem 
originates  with  difficulties  in  tracing  underlying 
assumptions  and  the  number  of  parameters  used 
there.  For  instance,  what  variant  of  MO  calcula¬ 
tions  is  used,  and  what  assumptions  and  ap¬ 
proximations  are  involved  there?  How  does  one 
estimate  molecular  volume?  What  is  involved  in 
determining  the  numerical  magnitude  of  the 
volume?  How  does  one  scale  various  contribu¬ 
tions?  To  what  extent  are  selected  QSAR  parame¬ 
ters  internally  consistent  and  to  what  extent  are 
individual  parameters  independent?  How  does  a 
change  in  a  choice  of  one  descriptor  influence 
changes  of  other  parameters  in  order  to  preserve 
internal  consistency  of  the  model?  It  may  be  dif¬ 
ficult  to  answer  these  questions.  It  is  this  accumu¬ 
lation  of  many  small  steps,  each  perhaps  well 
defined,  which  eventually  makes  it  difficult  to 
identify  the  degrees  of  freedom  used  in  subsequent 
correlations.  The  situation  may  be  contrasted  to 
the  use  of  graph  theoretical  descriptors,  the  num¬ 
ber  of  which  is  always  known  and  which  are 
defined  a  priori. 

The  correlations  reported  by  Timmermans  and 
Van  Zwieten  (49J  are  summarized  in  Table  1.  We 
give  the  statistics  and  the  correlation  equation 
corresponmg  to  a  set  of  18  methyl  and  chloro 
derivatives  which  we  selected  from  the  initial  set 


TABIL  l 

Summary  of  ihc  correlations  based  on  eighteen  2- 
(aryhmino)imidazohdine  compounds  having  only  chlonne  as 
hctcroatoms 


Regression  R  S 

0546  log  /*- 0222  (log  P)z  0.786  0629 

- 0 0(M  ( Par)*  +0119  Par-0534  pA' 

+  2.707  HOMO + 4.984  bfc- 15.583  09(4  0301 

-  0.717  p A'  -0  057  0  675  0  726 

0.1  H  (Par)*  -00003  Par  -8  842  0731  0691 


-0885  pK  +6687  HOMO +7,238  EE +22  651  0.789  0646 
-  0  0003  (Par)1  +  0  096  Par  -  0  572  pA.'  -  7  849  0  902  0.454 
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of  27  compounds.  The  revised  correlations  gave 
slightly  better  statistics,  as  expected,  m  view  of  the 
fact  that  now  the  sample  of  compounds  studied  is 
more  homogeneous. 

We  may  briefly  summarize  the  main  results  of 
Timmermans  and  Van  Zwieten  as  follows: 

(a)  correlation  coefficient  R  of  over  0.950  (and 
accompanying  standard  deviation.  S .  of  less 
than  0.350)  require  five  molecular  descriptors; 

(b)  single  descriptor*  log  P  (as  an  indicator  of 
drug  transport  processes),  gives  the  correlation 
with  R  m  0.529  (S  «  0  864); 

(c)  the  major  single  variable  of  the  best  correla¬ 
tion  is  d  pA'4  with  R  =»  0.482  and  S  -  0.892; 

(d)  the  best  two-parameter  correlation  involves 
parachor  (linear  and  quadratic  terms)  and  in¬ 
creases  the  correlation  coefficient  to  R  =  0.656 
(S  »*  0784); 

(e)  the  best  three-term  expression  (based  on 
parachors  and  d  p Kt)  achieves  the  somewhat 
respectively  correlation  coefficient  of  R=* 
0  853  (5  -  0.544). 

Timmermans  and  Van  Zwieten  [49J  concluded 
their  study  by  examining  the  role  of  the  hydro- 
phobic  constant  w  and  the  role  of  the  steric  sub¬ 
stituent  parameter.  Each  case,  in  a  comparison 
with  the  best  five-parameter  correlation,  shows  a 
slightly  reduced  correlation  coefficient  (R  «  0.912 
and  R « 0.943.  respectively)  and  an  increased 
standard  deviation  (5  0.455  and  5  -  0.369,  re¬ 
spectively).  The  traditional  QSAR  study  of 
Timmermans  and  Van  Zwieten  well  illustrates  the 
various  choices  in  multiple  regression  analysis, 
resulting  in  a  correlation  equations  using  five  de¬ 
scriptors  with  a  high  coefficient  of  multiple  regres¬ 
sion. 

How  would  graph  theoretical  schemes  fare  in 
comparison? 


THE  connectivity  index  for  heteroatoms 

In  order  to  consider  the  above  question  we 
have  first  to  consider  an  adequate  graph  theoreti¬ 
cal  approach  to  hetcroatoms.  The  index,  initially 
called  the  branching  index  (53)  and  subsequently. 


quite  appropriately,  renamed  by  Kier  et  al.  (54)  as 
the  connectivity  index,  was  designed  fron  an  anal¬ 
ysis  of  selected  physicochemical  properties  of  al¬ 
kanes,  Firstly  one  orders  isomers  with  respect  to  a 
property  of  interest.  Thus,  for  example,  in  the  case 
of  hexanes  and  their  boiling  points  we  obtain  the 
following  sequence. 

2,2-dimethylbutane  <  2,3-dimethylbutane 

<  2-methylpentane  <  3-methylpentane 

<  n-hexane 

By  differentiating  bond  types  involved  the  above 
ordering  leads  to  inequalities,  shown  below,  where 
(m,  n)  represents  CC  bond  type  with  m  and  n 
being  neighboring  carbon  atoms: 

[(1, 2)  +  3(1. 4)  +  (2,  4)]  <  [4(1.  3)  +  (3.  3)] 
<[(1,2) +  2(1. 3) +  (2. 2) +  (2.3)] 

<[2(1. 2) +  (1,3) +  2(2.  3)) 

<[2(1. 2) +  3(2. 2)) 

Similar  inequalities  follow  from  ordering  of  other 
alkanes.  The  bond  type  contributions 

(1, 2),  (l,  3),  (1. 4),  (2.  2),  (2,  3).  (2. 4),  (3,  3), 
(3, 4)  and  (4, 4) 

are  viewed  as  unknown  variables,  which  will  need 
to  be  determined,  instead  of  searching  for  individ¬ 
ual  (m,  /i)  values  one  finds  that  a  simple  al¬ 
gorithm:  n)  generates  an  acceptable 

solution.  Hence,  this  single  assumption  defines 
bond  contributions  to  the  connectivity  index  (53). 

It  may  appear  amazing  that  a  simple  ad  hoc 
mathematical  construction,  the  connectivity  index, 
performs  so  well.  But  there  ought  to  be  no  surprise, 
because  the  index  was  constructed  to  be  a  solution 
to  an  ordering  of  structures,  an  ordering  which 
parallels  the  relative  magnitudes  for  selected  prop¬ 
erties.  The  success  of  the  connectivity  index  is  in 
its  design.  One  can  interpret  the  variable  bond 
weights  as  relative  eontributions  of  bonds  in  a 
typical  molecular  additivity,  when  bonds  are  dif¬ 
ferentiated  according  to  the  number  of  the  nearest 
neighbors.  The  bond  types  (1, 2),  (1, 3)  and  (1, 4), 
for  example,  correspond  to  bonds  between  primary 
and  secondary,  primary  and  tertiary,  and  primary 
and  quaternary  carbon  atoms,  respectively.  The 
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connectivity  index  attributes  different  ‘volumes’ 
and  ‘surface’  contributions  to  these  different  bond 
types,  simulating  the  relative  volume  and  surface 
fragment  contributions. 


WEIGHTED  PATH  NUMBERS  AS  MOLECULAR  DE¬ 
SCRIPTORS 

A  single  molecular  descriptor  will  not  suffice  in 
many  applications.  When  extending  the  basis  of 
descriptors  one  can  (i)  either  consider  a  collection 
of  additional,  structurally  unrelated  descriptors  or 
(a)  design  a  number  of  different  but  structurally 
related  descriptors.  The  higher  connectivity  in¬ 
dices  (55J  represent  an  illustration  of  the  latter. 
They  were  defined  by  extending  the  bond  as  a 
fragment  to  pairs  of  bonds  and  several  consecutive 
bonds  as  larger  molecular  fragments.  In  this  way 
not  only  one  increases  the  number  of  descriptors 
available  for  regression,  but  also  facilitates  the  use 
of  sequences  as  mathematical  objects  to  represent 
structures.  Other  choices  of  structurally  related 
descriptors  include  extended  connectivities  [56], 
path  numbers  157),  weighted  path  numbers  (58) 
and  distance  sums  (59).  We  will  use  here  weighted 
path  numbers  (to  be  subsequently  briefly  outlined). 


In  Table  2  we  illustrate  weighted  path  numbers  on 
a  ten-atom  common  fragment  for  compounds  of 
Fig.  1: 

9 

U  . 

•OK 

7  8\ 

10 

which  represents  a  variable  fragment  of  the  graph 
of  clonidine-hhe  molecules  The  weighting  factors 
for  the  individual  bond  types  are  the  same  ones 
introduced  in  the  definition  of  the  connectivity 
index. 

Let  us  emphasize  the  wealth  of  data  in  Table  2. 
Firstly,  for  each  atom  separately  we  obtain  path 
sequences,  these  are  the  numbers  listed  in  separate 
rows.  As  a  sum  of  atomic  path  sequences  we  show 
in  the  last  row  the  corresponding  sequence  for  the 
molecule.  The  first  number  gives  the  number  of 
atoms,  but  alternatively  this  can  be  replaced  by 
the  ‘molecular’  zero-order  connectivity  index  of 
Kier  and  Hall  (3).  The  second  number  in  the 
molecular  sequence  is  the  connectivity  index, 
which  can  be  viewed  as  the  molecular  path  num¬ 
ber  associated  with  paths  of  length  one,  i.c.  bonds 
The  successive  path  counts  correspond  to  higher 
connectivity  indices,  although  they  differ  some- 


TABLE  2 

Weighted  path  numbers  for  a  ten-atom  fragment  of  the  2.6-dimethyl  derivative  of  clonidme 

Rows  gisc  weighted  paths  tor  individual  atoms,  the  last  row  (obtained  by  summing  atomic  contributions)  represents  a  chaiaclenza 
non  of  the  molecule  (molecular  fragment)  as  a  whole. 


Atom 

P\ 

Pi 

Pt 

Pa 

Pi 

Pa 

Pi 

Atomic  ID 

I 

0817 

0272 

0181 

0.179 

0037 

0019 

OOOS 

2  516 

2 

1.150 

0222 

0219 

0<W5 

0023 

0  001 

OQOI 

2674 

3 

1 

0.929 

0.136 

0068 

0028 

0016 

3177 

4.8 

1.319 

0426 

0.302 

0064 

0019 

0001 

3.170 

5.7 

0.90$ 

0  622 

0.193 

0175 

0  032 

0016 

2  945 

6 

1 

0408 

0  372 

0091 

0  082 

2953 

9.10 

0577 

0428 

0246 

0.  75 

0037 

0029 

2  497 

Molecule 

Molecular  ID 

4.788 

2.392 

1.195 

0605 

0203 

0071 

0013 

19  271 
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what  in  the  definition  in  that  here  the  weight 
factors  of  the  'connecting’  atoms  are  used  twice. 
But  that  is  a  minor  difference  that  changes  the 
results  quantitatively,  not  qualitatively,  and  we 
may  continue  to  refer  to  these  as  ‘higher’  connec¬ 
tivity  indices.  In  addition  to  the  quantities  a'ready 
mentioned  we  may  also  consider  adding  atomic 
contributions,  not  along  columns  as  was  the  case 
with  deriving  the  molecular  path  numbers,  but 
along  the  rows.  We  then  obtain  a  characteristic 
number  for  each  atom,  the  so-called  atomic  identi¬ 
fication  (ID)  number.  As  one  can  see  these  atomic 
ID  numbers  are  sensitive  to  the  atomic  environ¬ 
ment  and  tend  to  be  different  for  atoms  even  in 
highly  similar  atomic  environments  However,  sig¬ 
nificantly,  smaller  changes  in  the  environment  are 
accompanied  by  smaller  variations  in  atomic  ID 
numbers.  By  adding  all  atomic  ID  numbers  (or 
alternatively  by  adding  the  molecular  path  num¬ 
bers,  proper  account  of  the  role  of  the  zero-con¬ 
nectivity  index),  one  obtains  the  molecular  ID 
number  [60}  These  molecular  ID  numbers,  which 
in  a  way  encode  the  molecular  volume,  have  al¬ 
ready  been  used  in  some  structure-activity  cluster¬ 
ings  and  correlations  (58,61).  One  ought  to  view 
Table  2  as  a  pool  of  various  molecular  descriptors. 

Is  it  possible  to  incorporate  heteroatoms  in 
some  analogous  way  m  the  path  count  schemes? 

The  quantities  in  Table  2  were  calculated  (by  a 
program  ALL  PATH  [62,63]  from  the  graph  ad¬ 
jacency  matrix,  which  have  zero  everywhere  (in¬ 


cluding  diagonal  entries)  except  on  places  corre¬ 
sponding  to  any  pair  of  connected  atoms  when  the 
entry  is  1.  Heteroatom  X  can  be  discriminated  by 
settling  the  corresponding  diagonal  elements  oi  a 
matrix  to  be  different  from  zero.  This  is  fully 
analogous  to  the  treatment  of  heteroatoms  in 
HMO  theory.  Spialter  [64,65],  attempted  in  this 
way  to  record  heterosystems  in  chemical  docu¬ 
mentation,  and  even  earlier  Balandin  [66]  used  the 
same  technique  to  identify  heteroatoms  Dugundji 
and  Ugi  [67],  in  a  similar  manner,  recorded  the 
number  of  valence  electrons  of  non-carbon  atoms 
in  their  BE  (bond- electron)  matrices  used  to  fol¬ 
low  chemical  reactions.  Thus  it  appears  natural  to 
use  variable  diagonal  entry  to  discriminate  among 
heteroatoms,  a  practice  which  apparently  is  not 
novel. 

In  Table  3  we  list  a  weighted  path  numbers  for 
the  same  ten-atom  fragment  of  clomdine,  but  now 
the  atoms  9  and  10,  corresponding  to  chlorines  in 
compound  /.  have  been  assigned  a  non-zero  diag¬ 
onal  entry  in  the  adjacency  matrix  The  ALL¬ 
PATH  program  recognizes  the  non-zero  diagonal 
entries  and  modifies  the  weighted  path  count 
accordingly  Hence,  if  we  compare  Table  2  and 
Table  3  we  can  observe  the  differences  induced  by 
the  two  chlorine  atoms.  Thus,  Table  2  represents 
the  2, 6-dimethyl  derivative,  compound  6,  while 
Table  3  corresponds  to  the  2,6-dicholoro  deriva¬ 
tive,  compound  /.  In  the  next  section  we  will 
consider  correlations  between  the  eighteen  im- 


TABIX  3 

Weighted  path  numbers  fox  the  ten-atom  fragment  with  chlorine  atoms  as  hetcroutom  substituents  (labels  9, 10),  coriespondmg  to  the 
2.5-dichloro  derivative  of  clomdine 


Observe  a  similarity  between  the  corresponding  path  numbers  of  Tables  2  and  3 


Atom 

_ Pj_ _ 

Pi 

Pi 

P* 

Pi 

P<, 

Pi 

Atomic  ID 

I 

0816 

0  272 

0181 

0191 

0037 

0019 

0008 

2  529 

2 

uso 

0222 

0  234 

0045 

0023 

0009 

0006 

2  690 

3 

1 

0,975 

0136 

0068 

0028 

0018 

3  225 

4,8 

1387 

0426 

0  310 

0064 

0052 

0005 

0004 

3  248 

5.7 

0908 

0650 

0193 

0,185 

0032 

0017 

2  984 

6 

1 

0408 

0400 

0091 

0085 

2  984 

9, 10 

0  646 

0479 

0  275 

0200 

0042 

0034 

0003 

2  680 

Molecule 

Molecular  ID 

4  924 

2  493 

1254 

0647 

0212 

0078 

0014 

19  625 
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idazohdines  of  Fig.  1  using  the  graph  theoretical 
descriptors  from  Table  2  and  Table  3,  and  similar 
data  for  other  compounds  of  interest. 


GRAPH  THEORETICAL  CORRELATION  OF  THE  AC¬ 
TIVITIES  OF  IMIDAZOLIDINES 

The  emphasis  in  this  article  is  on  advantages  of 
graph  theoretical  desenptors  in  comparison  with 
the  traditional  QSAR  descriptors.  Table  3  il¬ 
lustrates  how  a  graph  theoretical  scheme  naturally 
incorporates  heteroatoms,  but  the  task  of  finding 
optimal  ‘diagonal’  contributions  for  various  het¬ 
eroatoms  or  even  the  same  hetcroatom  in  different 
environments  remains  to  be  studied  in  greater 
detail.  The  preliminary  cxa»  ..nation  reveals  that 
positive  diagonal  elements  decrease  the  path 
counts  while  negative  elements  increase  the  mag¬ 
nitude  of  the  weighted  path  counts  This  suffices 
for  our  purpose  of  generating  preliminary  connec¬ 
tivity  indices  that  discriminate  positional  isomers 
with  vanable  hetcroatom  location.  In  Table  4  we 
listed  the  leading  connectivity  indices  for  the  eigh¬ 
teen  compounds  of  interest  as  derived  by  the 
ALLPATH  program  with  assumed  A'  =  -0.20  en¬ 
try  for  each  chlorine  present.  In  addition  there  is 
also  an  option  to  change  C-CI  bond  weights  but 
at  this  stage  we  decided  to  keep  the  number  of 


TABLE  4 


Leading  connectivity  indices  for  the  eighteen  compounds  con¬ 
sidered 


No 

Compound 

l-X 

l-X 

3- * 

1 

2.6-Clj 

4278 

2015 

0  978 

2 

2.4,6-Clj 

4  418 

2120 

0  969 

3 

2.3-Clj 

4  278 

2015 

0963 

4 

2,6-Clr4.Me 

4  384 

2092 

0  957 

5 

2-Cl*6-Me 

4  244 

1  9S9 

0  964 

6 

2,6-Mej 

4  210 

1964 

0949 

7 

2,4-Clj 

4  262 

2036 

0  982 

8 

2-CM-Me 

4228 

2008 

0970 

9 

2.4  Cl2-6-Me 

4  384 

2095 

0  955 

10 

2.4.MC2-6-C1 

4  350 

2067 

0944 

11 

2,5-Cl2 

4  262 

2036 

0982 

12 

2-0 

4122 

1939 

0  982 

13 

2.6-Mej-4.Cl 

4  350 

2069 

0942 

14 

2Me-4.CI 

4  228 

2011 

0  968 

15 

2,4,6-Mcj 

4  316 

2  042 

0931 

16 

2.4-Mej 

4194 

1  983 

0956 

17 

2-Me 

4088 

1  914 

0966 

18 

Unsubstiluted 

3.966 

1869 

0979 

variables  to  a  minimum.  In  order  to  emphasize  the 
role  of  substitution  pattern,  because  we  are  dealing 
with  compounds  of  different  number  of  atoms,  we 
focused  attention  on  the  eight-atom  skeleton 


Fig.  2.  Hoi  of  the  connectivity  index  against  leg  1,  ED  Open  circles  indicate  compounds  without  chlorine  substituents,  singly  crossed 
circles  indicate  compounds  with  single  substituted  chlorine,  doubly  crossed  circles  indicate  compounds  with  two  chlorines,  and  triply 
crossed  circle  indicates  the  compound  with  three  chlonne  heteroatoms 
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shown  below,  which  is  common  to  all  the  struc¬ 
tures  and  is  sensitive  to  substitutions 

O-7 

The  reported  connectivity  indices  in  Table  4  there¬ 
fore  represent  fragment  connectivities,  le  they 
include  only  contributions  from  the  above  com¬ 
mon  eight  atoms.  The  computed  path  numbers, 
however,  involve  from  eight  to  eleven  atoms,  de¬ 
pending  on  the  substitution  pattern.  In  this  way 
we  have  separated  the  combined  influences  of  the 
two  structural  features  —  the  size  and  the  shape 
—  which  will  enable  us  to  focus  attention  on  the 
‘shape’,  i.e.  the  substitution  pattern,  and  its  role 
on  the  relative  bioactivities  of  the  compounds 

With  a  single  graph  theoretical  desenptor,  the 
connectivity  index  1  -  X  of  Table  4,  wc  obtain  the 
correlation  shown  in  Fig.  2.  The  correlation  coeffi¬ 
cient  is  R  =  0.690  and  the  standard  error  estimate 
is  S  =  0.712  This  is  visibly  better  than  the  best 
single  property-based  QSAR  correlation,  the 
log  P\  with  R  m  0.529  and  5  =  0.846  The  corre¬ 
lation  equation 

log(l/£X>)  «  5  781X  -  24.1643 

explains  almost  50%  of  the  variance  in  hypoten¬ 
sive  activity  and  equally  shows  that  bond  additiv¬ 
ity  (implied  in  the  connectivity  index)  is  not  the 
only  aspect  of  this  particular  structure-activity 
relationship.  The  above  may  be  contrasted  with 
the  log  P*  correlation,  which  account  for  only  30% 
of  the  variance  in  hypotensive  activity  and  shows 
that  lipophilic  behaviour  is  not  the  dominant  con¬ 
tributor  to  the  biological  activity  of  clomdine-hke 
imidazohdincs. 

What  is  the  next  best  descriptor  that  will  im¬ 
prove  the  correlation?  A  way  to  proceed  is  to 
examine  the  correlation  predictions  more  closely 
and  sec  if  a  well  characterized  subset  of  the  com¬ 
pounds  show  greatr  departure  from  the  correla¬ 
tion,  By  inspection  of  Fig.  2,  which  gives  a  plot  of 
log(l/£Z>)  against  the  connectivity  index,  we  ob¬ 
serve  the  average  values  of  log(l /ED)  signifi¬ 
cantly  increase  with  the  number  of  chlonne  atoms 
as  substitutents.  Hence  we  may  expect  that  inclu¬ 


sion  of  the  molecular  weight,  which  increases  with 
the  number  of  substituted  chlorines,  will  improve 
the  correlation.  Alternatively,  we  may  consider  the 
count  of  chlorines  (which  parallels  molecular 
weight  as  a  descriptor)  to  improve  the  correlation) 
This  observation  leads  to  the  following  two-de- 
senptor  correlation  equation: 

Iog(l/£Z>)  *  3.508*  +  0.440AT-  15.0101 

with  R  =  0.764  and  S  =  0  656,  where  N  is  0,  1  or 
2.  The  improvement  is  not  dramatic,  but  the  corre¬ 
lation  is  significantly  better  than  the  quadratic 
correlation  based  on  log  P '  (with  R  —  0.647  and 
S  »  0.793)  or  the  quadratic  correlation  based  on 
parachor  (with  R  =  0.656  and  5  =  0  784),  which 
similarly  involve  three  terms  in  the  correlation 
equations.  Hence,  again,  we  see  that  simple  graph 
theoretical  considerations  produce  visibly  better 
results. 

Another  look  at  the  compounds  which  show  a 
greater  departure  from  the  correlation  line  in  Fig 
2  suggests  that  bond  dipoles  may  play  some  role. 
Among  the  isomers  having  a  same  number  of 
C-Cl  bonds  those  with  bonds  in  the  ortho  posi¬ 
tion  have  greater  activity  than  those  with  C-Cl 
bonds  m  meta  or  para  positions  Consequently, 
one  can  visualise  the  resultant  dipole  vectors  as 
pointing  to  the  direction  of  the  ‘shift’  of  the  points 
in  the  correlation  By  using  the  magnitudes  D  of 
the  dipoles  (which  arc  sensitive  to  the  substitution 
mode)  as  a  parameter  we  also  expect  to  improve 
the  structure- activity  correlation  Implemcmation 
of  this  observation  leads  to  the  expression. 

log(l/£/>)  »  5.854*  +  0  679/)  -  24.508 

with  R  =  0.799  and  5-0611  This  particular 
two-desenptor  correlation  compares  well  with  the 
two-descriptor  correlation  of  Timmermans  and 
Zwieten. 

The  graph  theoretical  approaches  not  only  have 
their  quantitative  value,  they  also  provide  novel 
qualitative  structural  insights.  In  the  above  case 
wc  identified  molecular  weight  and  bond  dipoles 
as  potentially  useful  descriptors.  Nevertheless,  wc 
should  add  *hat  the  quantitative  results,  impres¬ 
sive  as  they  are,  are  not  necessarily  the  best  which 
the  particular  graph  theoretical  approach  may 
yield.  We  have  not  attempted  to  optimize  our 
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heteroatom  parameter  (the  diagonal  entry  in  the 
associated  adjacency  matrix)  That  there  is  a  room 
for  improvement  can  be  seen  by  considering 
another  choice  for  diagonal  entry  parameter  of 
chlorine.  The  value  of  x^3  -0.40  gives  a  better 
(single  descriptor)  correlation:  log(l /ED)  «=» 
4.860 A'  -  20.531  with  R  »  0.750  and  5  «  0.651,  as 
compared  to  the  correlation  derived  with  x 50 
—0  20  (fl=0  690  and  S  -  0.712).  This  result, 
which  is  also  not  optimal,  shows  that  a  single 
graph  theoretical  descriptor  can  capture  dominant 
structural  features  well.  If  a  single  graph  theoreti¬ 
cal  descriptor  can  produce  correlations  which  are 
better  than  alternatives  using  two  and  more  tradi¬ 
tional  descriptors,  it  seems  worthwhile  to  explore 
further  the  possibilities  based  on  graph  theoretical 
descriptors.  Recently  an  approach  to  the  construc¬ 
tion  of  better  single  descriptors  has  been  consid¬ 
ered  (68]  It  appears  that  further  improvements  in 
structure-property  and  structure-activity  studies 
are  possible  by  following  similar  promising  direc¬ 
tions  in  modifying  the  functional  dependence  of 
the  topological  indices  used  Traditional  QSAR 
approaches  lack  this  flexibility  by  virtue  of  being 
limited  to  molecular  properties  as  a  source  of 
structural  characterizations 


MULTIPLE  REGRESSION  USING  HIGHER  CONNEC¬ 
TIVITIES 

A  single  best  desentor  allows  one  to  model  a 
structure  activity  study  by  considering  the  role  of 
various  ‘correction’  factors,  as  outlined  above.  An 
alternative  approach  is  to  use  ‘higher-order’  de¬ 
scriptors.  such  as  higher-order  connectivities,  paths 
of  longer  length,  extended  connectivities,  etc.  If, 
for  a  collection  of  compounds  considered,  such 
descriptors  arc  not  strongly  intcrcorrelated  they 
may  span  the  structure  space  adequately,  and 
hence  produce  impressive  correlations.  We  want 
to  end  this  exposition  by  showing  correlations  of 
antihypertensive  activities  of  clonidinc-hkc  im¬ 
idazolines  using  longer  (weighted)  paths  involving 
the  particular  encoding  of  chlorine  heteroatoms. 
In  Table  5  we  collected  the  information  on  the 
correlations  using  paths  of  length  one  (the  connec¬ 
tivity  index  1  -  .V),  paths  of  length  two  (denoted 


TABLES 


Predicted  antihypertensive  activities  based  on  the  connectivity 
indices  1  — AT,  2—  X  and  3-  X  derived  from  multiple  regres¬ 
sion  and  cross-validation 


No 

Compound 

Regression 

Cross- 

validation 

Expen- 

mem 

l 

2.6-Clj 

2  034 

1977 

214 

2 

2,4,6-Clj 

1460 

1478 

141 

3 

2.3-Cl2 

1298 

1286 

1  37 

4 

2,6-Clr4.Me 

1061 

1035 

122 

5 

2-Cl-6-Mc 

1372 

1451 

1 12 

6 

2,6-Mcj 

0697 

0  627 

085 

7 

2,4- Cl  2 

0  566 

0  536 

0  68 

8 

2-Cl-4-Me 

0111 

0061 

068 

9 

2»4-C12-6-Mc 

0  850 

0901 

0  57 

10 

2.4-Mcj-6-C1 

0  459 

0448 

0  52 

n 

2,5-CIj 

0  548 

0607 

0  32 

12 

2-C1 

0  259 

0  285 

015 

13 

a6-Mcj-4.CI 

0249 

0  300 

004 

14 

2-Mc-4*C! 

-0080 

-0084 

-005 

15 

2.4,6-Mc, 

-0150 

-0193 

-007 

16 

2,4-Mej 

-0  532 

-0527 

-0  56 

17 

2-Mc 

-C  *48 

-0406 

-061 

18 

Unsubsntutcd 

-2076 

-2047 

-210 

R 

0  9773 

0  9676 

S 

02223 

02475 

as  2  -  ,V  and  corresponding  to  the  connectivity 
index  of  second  order)  and  paths  of  length  three 
(denoted  as  3  -  X  and  corresponding  to  connec¬ 
tivity  indices  of  order  three).  The  three  connectiv¬ 
ity  indices  1  -  X%  2  -  X  and  3  -  A'  have  not  been 
selected  as  the  best  three  from  a  pool  of  possible 
indices,  their  reciprocal  and  other  combinations, 
as  sometimes  has  been  the  case  in  multiple  regres¬ 
sion  analyses.  They  have  rather  been  selected  as 
the  leading  members  of  a  sequence  of  weighted 
paths  (higher  connectivities). 

Connectivity  indices  1  -  X%  2  -  X  and  3  -  X 
lead  to  quite  impressive  regressions  A  stated 
earlier,  1  -  X  already  accounts  for  close  to  50%  of 
the  variance.  In  combination  with  2  -  A  the  two- 
descriptor  characterization  of  the  compounds 
(three-parameter  correlation  equation)  account  lor 
60%  of  the  variance  (correlation  coefficient  R  » 
0  781).  This  is  better  than  any  two-parameter  cor¬ 
relation,  based  on  traditional  QSAR  parameters, 
even  including  correlations  using  bond  dipoles  or 
molecular  weights  as  descriptors  in  conjunction 
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with  the  connectivity  index.  The  improvement  by 
including  2  —  X  to  the  already  existing  correlation 
based  on  1  -  X  is  substantial,  even  if  not  dramatic. 
In  part  a  reason  for  the  achievement  of  only  a 
partial  improvement  is  that  l— X  has  already 
absorbed  much  of  the  correlation  variance.  How¬ 
ever,  if  we  now  include  3  -  X,  in  addition  to  1  -  X 
and  2  —  X,  we  obtain  a  correlation  equation  which 
accounts  for  95.5%  of  the  variance  (a  correlation 
coefficient  of  0  977).  The  regression  is  also  accom¬ 
panied  by  an  impressive  reduction  of  the  standard 
deviation  to  5  =  0.223.  This  particular  result  is 
better  than  a  correlation  based  on  four  and  five 
descriptors  using  any  combination  of  apparently 
plausible  physicochemical  descriptors,  such  as 
log  P%  d  p X,.  parachor.  TaftV  stenc  constants, 
molar  refractions  etc.,  supplemented  by  quantum 
chemical  parameters,  such  as  HOMO  and  LUMO 
parameters  and  their  derivatives. 

The  central  finding  —  that  the  X  indices  pro¬ 
vide  a  superior  correlation  of  the  antihypertensive 
clomdine  data  for  the  eighteen  compounds  chosen 
—  appear  to  be  correct,  providing  that  a  chance 
correlation  docs  not  play  a  role.  In  order  to  con¬ 
firm  this  finding  we  undertook  to  examine  whether 
the  result  would  be  upheld  by  cross-validation.  In 
Table  5  we  also  report  the  outcome  of  the  cross- 
validation.  which  leads  to  the  overall  coefficient  of 
correlation  of  0.968  with  the  standard  error  of 
estimate  of  0.247.  The  result  is  particularly  strik¬ 
ing  for  this  data  set,  because  there  are  two  extreme 
potency  values  which  would  be  expected  to  give 
much  trouble  in  cross-validation.  The  suspicion 
with  which  many  people  in  the  QSAR  community 
regard  graph  theoretical  approaches  is  based  on 
misunderstandings  of  graphs,  on  a  feeling  that 
theie  is  no  physicochemical  basis  for  connectivity 
correlations.  Since  “receptors  surely  do  not  per¬ 
form  edge  counting*',  skeptics  feel  that  correla¬ 
tions  with  graph  indices  which  do  exist  are  actu¬ 
ally  a  consequence  of  correlations  with  some  more 
meaningful'  physicochemical  property  which  the 
graph  indices  happen  to  correlate  with.  However, 
the  result  reported  here  cannot  be  understood  in 
this  way.  With  new  statistical  methods,  such  as  the 
partial  least-squares  method,  inclusion  of  many 
sets  of  highly  intercorrelated  parameters  is  no 
longer  a  problem,  and  combining  graph  indices 


with  physicochemical  indices  in  a  single  study  is 
practical. 


COMBINED  USE  OF  PHYSICOCHEMICAL  AND  GRAPH 
THEORETICAL  DESCRIPTORS 

While  the  traditional  QSAR  parameters  may 
have  apparent  advantages  in  some  applications  in 
this  particular  study,  where  several  factors  con¬ 
tribute  to  the  overall  molecular  behavior,  it  is 
difficult  even  to  speculate  on  the  importance  of 
individual  physicochemical  descriptors  On  the 
other  hand  graph  theoretical  descriptors  can  not 
only  do  the  same  job,  they  can  accomplish  it 
impressively  better.  Successful  graph  theoretical 
correlations,  of  course,  do  not  signal  the  termina¬ 
tion,  or  even  a  diminution  of  the  importance  of 
traditional  approaches;  rather,  they  indicate  the 
beginning  of  a  novel  alternative,  sending  a  signal 
for  attention.  Certainly,  one  needs  to  accumulate 
more  experience  and  additional  insights  into  the 
potential  of  the  outlined  approach.  We  do  not 
even  claim  any  general  suitability  of  the  approach 
outlined  for  the  study  of  structure-activity  phe¬ 
nomena  involving  heteroatoms.  Even  less  do  we 
want  to  leave  the  impression  that  the  traditional 
approaches  have  no  considerable,  as  yet  untapped, 
potential  along  with  graph  theoretical  approaches 
in  QSAR.  In  fact,  we  believe  that  combined  ap¬ 
proaches  using  molecular  properties,  quantum 
chemical  parameters  and  well  selected  graph  theo¬ 
retical  descriptors  arc  likely  not  only  to  produce 
superior  correlations  but  are  likely  to  do  so  in  a 
most  efficient  way.  While  this  paper  has  demon¬ 
strated  some  advantages  of  mathematical  descrip¬ 
tors  as  opposed  to  physicochemical  descriptors  in 
this  particular  application,  the  advocation  of  one 
set  of  descriptors  docs  not  preclude  the  use  of 
other  sets  of  descriptors.  Moreover,  any  claim  to  a 
general  superiority  of  one  kind  of  descriptors  over 
another  kind,  even  if  based  on  a  larger  body  of 
results,  overlooks  the  possibility  that  yet  unex¬ 
plored  descriptors  (properties  or  graph  invariants) 
may  surpass  in  quality  those  considered  hitherto 
It  seems  that  the  most  pragmatic  approach  at  this 
time  is  to  combine  physicochemical  descriptors 
with  graph  theoretical  descriptors,  a  course  which 
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already  has  received  some  support  (69-71 1.  This 
then  represents  a  generalization  of  a  more  com¬ 
mon  current  practice  in  which  physicochemical 
descriptors  are  combined  with  quantum  chemical 
descriptors.  Such  generalized  approaches  are  likely 
to  result  not  only  in  better  but  also  in  simpler 
correlations  than  the  approaches  using  one  type  of 
descriptor  only,  if  used  separately. 

To  illustrate  a  relationship  between  properties 
and  connectivities  as  descriptors  for  the  eighteen 
compounds  considered  we  report  in  Table  6  corre¬ 
lations  using  traditional  QSAR  descriptors  against 
the  connectivity  index  X.  Such  correlations  may 
assist  one  in  selecting  graph  theoretical  and 
physicochemical  descriptors  in  ‘admixture’.  We 
find  that  X  and  parachor  produce  quite  a  good 
correlation  (R  =  0  965),  not  quite  unexpectedly,  in 
view  of  the  interpretation  of  the  parachor  in  terms 
of  molecular  surface.  The  magnitudes  of  molecu¬ 
lar  surface  area  are  well  simulated  by  the  relative 
magnitudes  of  the  connectivity  index  [52].  Also  a 
quite  good  correlation  (with  R  =  0.950)  was  ob¬ 
tained  between  X  and  hydrophobic  constants 
(summation  over  the  substituent  v  values).  The 
correlation  between  X  and  the  Taft  substituent 
steric  constants  produced  a  fair  correlation,  not  as 
good  as  hydrophobic  constants  or  parachor,  but 
still  suggesting  that  over  75%  variance  is  accounted 
for  by  X  (/?  =  0.881).  On  the  other  hand,  the 
correlation  between  X  and  quantum  chemical 
HOMO  parameters  (as  well  as  the  derived  HE 
parameters)  are  nonexistent  (R  =  0.114  and  R  « 
0070,  respectively).  These  molecular  orbital  de¬ 
scriptors  (for  the  set  of  structures  considered)  have 
apparently  ‘nothing  in  common’  with  the  bond 

TABLE  6 


Correlation*  between  the  various  physicochemical  descriptors 
and  ihe  connectivity  indices  for  the  eighteen  compounds  con¬ 
sidered 


Descriptor 

R 

5 

Cocfficicni 

Constant 

Parachor 

0  965 

8.92 

2809 

- 1032  8 

V 

0950 

0.169 

4  397 

- 17.365 

C% 

0881 

0463 

-7  350 

28  953 

lo$P 

0.715 

0.734 

6  397 

-27  603 

pA.', 

0.430 

0  837 

-3  392 

13.757 

HOMO 

0.114 

0.161 

0.158 

-12  330 

CO 

0071 

0.146 

0089 

7  2519 

TABLE  7 

Two-parameter  correlations  combining  the  connectivity  index 
and  selected  physicochemical  descriptors 


Regression 

R 

5 

4.154  *  -0  489  pA\  - 17.574 

0  808 

0  599 

2607A>0500 log  P-10465 

0786 

0628 

6181  *-2  212  HOMO-51  654 

0782 

0633 

5  604  *4-1.988  EE -38  636 

0  751 

0671 

11052  *-1385  *-44  661 

0.701 

0725 

additivities  implied  by  the  connectivity  index. 
Hence,  they  illustrate  descriptors  which,  figura¬ 
tively  speaking,  are  ‘orthogonal’  to  the  connectiv¬ 
ity  index.  They  supply  additional  ‘directions’  in 
correlations  if,  on  their  own,  they  show  some 
correlation  with  a  property  considered  We  should 
emphasize  that  use  of  R>  the  coefficient  of  regres¬ 
sion,  or  R2,  the  coefficient  of  determination,  as  a 
sole  criterion  for  a  quality  of  a  regression,  as  is 
known,  is  deficient  and  can  be  downright  mislead¬ 
ing.  Hence  conclusions  based  on  R  or  R2  have  to 
be  taken  with  due  reservation.  It  is  desirable  to 
substantiate  such  correlations  with  other  indepen¬ 
dent  statistical  criteria,  such  as  are  given  by  mag¬ 
nitudes  of  the  standard  errors,  F- tests,  cross-vali¬ 
dation,  etc. 

In  Table  7  we  show  several  ‘mixed’  correlations 
based  on  the  connectivity  index  X  and  a  selected 
property  as  descriptors.  We  see  that  when  X  is 
combined  with  quantum  chemical  descriptors 
HOMO  and  EE  fair  correlations  result  (R  »  0.782 
and  R  =>  0.751,  respectively).  Comparisons  of  the 
correlations  in  Table  6  and  Table  7  give  insight 
into  the  role  that  some  physicochemical  descrip¬ 
tors  play  in  multiple  regressions  Wc  sec  that  there 
is  a  fair,  but  not  satisfactory,  correlation  between 
log  P*  and  X,  the  correlation  coefficient  being 
R  =  0.715.  Combined  log  P*  and  X  then  give  a 
better  correlation,  though  the  improvement  ap¬ 
pears  not  to  be  dramatic  (R  =  0.786).  Because 
log  P '  alone  docs  not  perform  well  ( R  —  0.529)  it 
seems,  then,  that  in  this  particular  application  to 
clonidinc-likc  compounds,  log  P'  owes  us  correla¬ 
tion  ‘power*  to  partial  parallelism  with  X.  How¬ 
ever,  the  parts  in  which  X  differ  from  log  P* 
appear  relevant  for  the  particular  correlation.  The 
situation  can  be  contrasted  to  the  use  of  d  pAT  as 


B  Original  Research  Paper 


225 


an  additional  physicochemical  descriptor  We  see 
that  d  pA'  combined  with  X  produces  a  good 
correlation  (A»08 08),  the  improvement  in  the 
correlation,  however,  in  this  case  is  more  substan¬ 
tial  This  should  not  be  surprising  in  view  of  the 
limited  correlation  between  d  p K  and  X  (R  » 
0.430)  It  implies  a  lesser  ‘duplication’  between  X 
and  d  p K  on  one  side,  while  the  improved  corre¬ 
lation  coefficient  in  the  combined  regression  points 
to  a  role  of  d  pAT,  which  alone  shows  poor  corre¬ 
lation  (R*  0.482),  as  complementary  descriptor, 
rather  than  competitive  to  X\  i.e.  they  differ  in 
structurally  relevant  features. 


CONCLUDING  REMARKS 

The  complexity  of  structure-activity  studies  is 
enormous,  and  different  methodologies,  even  if 
addressing  limited  aspects  of  the  QSAR  problem, 
ought  to  be  exhaustively  explored  and  combined  if 
possible.  We  have  demonstrated,  albeit  on  a  single 
case  of  hypotensive  clomdine-type  compounds, 
that  graph  theoretical  descriptors  not  only  have 
the  potential  to  describe  structural  variations  in 
molecules  with  ‘floating’  hetcroatoms,  but  that  the 
accompanying  descriptors  are  superior  to  any 
well-tested  combination  of  traditional  QSAR  de¬ 
scriptors.  The  result  ought  to  draw  attention  to 
mathematical  descriptors,  while  at  the  same  time 
the  use  of  physicochemical  descriptors  is  not  dis¬ 
couraged.  It  should  be  superfluous  to  add  that 
mathematical  descriptors,  of  which  graph  theoreti¬ 
cal  invariants  are  illustrations,  have  an  important 
advantage  —  an  explicit  structural  interpretation. 
By  contrast,  many  quantum  chemical  descriptors 
and  molecular  properties  as  descriptors  are  highly 
convoluted,  without  pointing  to  simple  structural 
features  directly  as  the  dominant  components  of  a 
correlation. 

Pragmatism  suggests  that,  at  least  at  the  pre¬ 
sent  time,  before  we  fully  understand  the  intricate 
interrelationship  of  structure  and  properties,  the 
best  results  may  follow  when  both  sets  of  descrip¬ 
tors  are  combined,  by  ‘mixing*  the  two  points  of 
view.  Be  that  as  it  may,  it  is  opportune  to  end  this 
article  with  a  quote  from  Max  Planck  (72],  in¬ 


tended  for  those  who  continue  to  be  skeptical 
regarding  graph  theoretical  methods: 

“  ..the  experimenter  cannot  afford  to  close  his 
eyes  to  a  new  discovery,  obtained  from  another 
point  of  view,  which  will  not  fit  his  own  ideas,  nor 
must  he  treat  it  as  unimportant,  if  not  incorrect  ”. 

One  should  not  need  to  add  that  graph  theoret¬ 
ical  indices  —  being  mathematical  constructions 
—  cannot  be  incorrect!  They  can  be  useful  or 
useles,  but  not  incorrect,  and  we  leave  it  to  readers 
to  decide  which  is  the  case 
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The  ligand* field  regime  defines  ihc  domain  of  applicability  and  underlying  reasons  for  the  empirical  success  of  ligand-field 
analysis  This  article  reviews  the  structural  connections  between  quantum  chemistry  at  large  and  the  phenomenology  of  the 
ligand-field  method  These  connections  provide  a  sound  basts  for  the  chemical  interpretation  of  ligand  field  parameters  Difference^ 
between  ligand-field  and  molecular-orbital  approaches  are  identified 


THE  LIGAND-FIELD  FORMALISM 

Ligand-field  theory  (LFT)  addresses  the  spec¬ 
troscopic  and  paramagnetic  properties  associated 
with  open  d  or  /  electron  shells  in  transition-metal 
complexes.  It  is  parametric.  Wc  require  of  the 
models  of  such  a  theory  that  all  appropriate  elec¬ 
tronic  properties  be  reproduced  essentially 
quantitatively  for  object  systems  regardless  of 
molecular  geometry,  coordination  number,  or  dn 
(/")  configuration,  on  the  same  footing,  and  that 
the  optimized  parameters  affording  that  reproduc¬ 
tion  be  rclatable,  both  empirically  and  structur¬ 
ally,  to  chemical  concepts  established  by  other 
means.  Hundreds  of  ligand* field  analyses  of 
paramagnetic  susceptibility,  electron*spin-rcso- 
nance  g  values,  ‘</*d 1  and  */-/  ’  transition  energies, 
intensity  distributions,  and  their  natural  or  mag¬ 
netic  circular  dichroism  have  satisfied  these 
criteria.  It  is  crucial  to  observe  that,  within  its 
proper  or  ‘regime’,  LFT  works,  because,  at  first 
sight,  it  ought  not  to. 

LFT  developed  from  crystal* field  theory  (CFT). 
Within  that  approach,  d  (/)  electron  energies 


(say)  are  calculated  by  diagonahzation  of  the  ap¬ 
propriate  d  (/)  basis  under  the  crystal-field  Ham¬ 
iltonian, 

N  e2 

*^cr  “  H  +  ^cr  (0 

i<J  u 

in  which  two-electron  energies  are  accounted  for 
by  the  Coulomb  operator  and  one-electron  en¬ 
ergies  by  the  crystal-field  potential,  Vcr  Various 
models  of  the  electrostatic,  classical  potential  have 
been  entertained,  ranging  from  ligands  as  point- 
charges  or  point-dipoles  to  spatially  extended 
charge  distributions.  In  each  case,  all  operators  of 
cq.  (1)  are  explicit,  involving  real  bond  lengths 
and  charges.  The  d  (/)  basis  is  equally  explicit 
foi  example  one  might  employ  the  3 d  functions  of 
Ciemcnti  et  al.  {1]  for  cobalt  as  a  dipositive  cation 
While  the  qualitative,  symmetry  aspects  of  CFT 
remain  as  useful  as  ever,  the  quantitative  predict¬ 
ions  of  splitting  parameters  and  accounts  of  the 
spectrochcmical  senes,  for  example,  were  recog¬ 
nized  to  be  hopeless  almost  from  the  beginning 
1935  marks  the  year  in  which  Van  Vleck  (2,3) 
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resolved  intriguing  conflicts  in  the  contemporary 
literature  and  introduced  amendments  to  CFT 
that  defined  the  birth  of  LFT.  In  essence, 
acknowledging  the  covalcncy  that  undoubtedly  ex¬ 
ists  in  all  transition-metal  complexes,  he  proposed 
LFT  as  an  isomorphous  approach  to  CFT  in  which 
the  operators  of  the  ligand-field  Hamiltonian, 

Ar-EWwJ+Pir  (2) 

KJ 

are  to  be  taken  as  effective  operators  and  ligand- 
field  splittings  to  be  regarded  as  parameters.  To¬ 
day,  we  refer  to  the  two-elcctron  energies  as  com¬ 
puted  with  an  effective,  or  screened.  Coulomb 
operator,  U(t,  j ),  and  the  one-electron  energies  as 
ligand-field  parameters  of  the  effective  hgand-field 
potential,  It  is  also  to  be  recognized  that  the 
only  part  of  the  basis  functions  that  is  employed 
explicitly  m  hgand-field  calculations  is  the  angular 
property.  Matrix  elements  of  functions  built  from 
pure  d  (/  »  2)  or  /  (/  ®  3)  orbitals  under  JfLF  are 
manipulated  within  LFT:  any  differences  between 
the  first  and  second  row  of  the  d  block,  for 
example,  are  left  to  emerge  in  the  parameters  of 
the  system.  Altogether,  therefore,  in  LFT  we  em¬ 
ploy  effective  operators  within  a  basis  whose  ra¬ 
dial  character  is  left  implicit.  One  immediate  con¬ 
sequence  of  these  differences  between  CFT  and 
LFT  is  the  change  from  (calculable)  frec-ion, 
two-electron  energies  —  like  B0  and  Q,  using 
Racah's  notation  —  to  parametric  quantities  like 
B%  C  and  the  nephclauxetic  effect. 

LFT  and  CFT  are  isomorphous  in  the  way  they 
formally  separate  one-  and  two-electron  effects 
and  by  their  operation  within  a  pure  d  (or  /) 
basis  No  explicit  recognition  is  made  of  metal  s 
or  p  functions,  or  of  ligand  orbitals.  They  arc  thus 
quite  unlike  molecular-orbital  (MO)  theory.  De¬ 
spite  Van  Vleck’s  illustration  (2)  of  the  effects  of 
covalency  upon  splitting  factors  by  reference  to 
MO  theory  in  his  famous  1935  paper,  it  is  quite 
incorrect  to  ''«ew  LFT  as  MO  theory  applied  to 
transition-metai  complexes.  LFT  and  MO  theory 
do  not  map  onto  one  another.  Over  the  past  ten 
years,  Woolley  and  Gcrloch  [4-7]  resolved  to  un¬ 
cover  the  underlying  reasons  for  the  successes  of 
LFT  and  so  to  provide  a  defensible  physical  basis 


for  the  interpretation  of  its  parameters.  These 
interrelated  aims  are  best  reviewed  separately,  first 
m  terms  of  a  many-electron  basis  and  then  with 
respect  to  the  one-electron  matrix  elements  that 
define  hgand-field  parameters 

PROJECTION  ONTO  A  d  ORBITAL  BASIS 

The  focus  on  a  d  or  /  basis  is  sharpened  by  a 
review  of  Lowdm’s  partitioning  theory  (8).  The 
Schrodinger  equation  for  some  full  many-electron 
problem  is  written. 

«  E *  (3) 

Expanding  the  eigenvectors  within  a  freely  chosen 
basis  ( <I> }  of  infinite  size, 

*.  =  £<■, A:  W 

k 

and  defining 

//«-<♦*  i -m>  (5) 

we  obtain  the  Heisenberg  matrix  representation  of 
cq.  (3): 

lie  =  Ec  (6) 

Suppose  wc  partition  the  basis  {<!>}  into  two 
groups,  a  and  b  of  dimension  Na  and  Nb%  respec¬ 
tively.  Nb  will  be  infinitely  large,  in  general  The 
infinitely  numerous  eqs  (6)  may  be  partitioned 
similarly: 

+  "  Eca  (7) 

lU  +  lV.n^  (8) 

where  ca  is  a  vector  of  dimension  Na  and  H4a  a 

square  matrix  of  that  dimension  The  vector  ch 

and  matrix  are  both  of  (infinite)  dimension. 
II ab  is  rectangular.  Provided  the  inverse  may  be 
defined,  wc  rewrite  cq.  (8)  as 

cb~(E'ibl>-nbbylnbac0  (9) 

and  substitute  it  into  eq.  (7)  to  give 

+  "„„(£■  I»-  (10) 

This  comprises  a  set  of  Na  equations  of  the  form 

Ti^.-Ec.  (11) 
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with 


We  can,  for  example,  make  the  identity 


HaJ  -  +  H„b(E  ■  lbb  -  (12) 

where  \bb  is  a  unit  matrix  of  dimension  Nb  X  Nb. 
Solution  of  the  secular  determinantal  equation, 

!H*a-£laa!~0  (13) 

yields  Na  eigensolutions  whose  energies  are  identi¬ 
cally  equal  to  Na  eigenvalues  of  eq.  (3)  with  eigen¬ 
vectors  expressed  as  (finite)  combinations  of  the 
sub-basis  {<Da}. 

The  same  formal  manipulations  may  be  ex¬ 
pressed  within  the  Schrodinger  representation  J4,7J 
by  defining  a  projection  operator  Pa  onto  the 
subspace  {<I>a}: 

K 

J’.-ElO,  ><*,  I  (14) 

i 

together  with  Qb  onto  the  orthogonal,  complemen¬ 
tary  subspace  {<hA}: 

(15) 

and  thence  by  construction  of  a  finite-dimensional 
Schrodinger  equation. 

(i*- £)<!>*  =  0  (16) 

with 

ii(£)  (n) 

where 

A  jr(E)^jrgb(E-Q„-QbJif’Qb)-,Qbjr  (18) 

We  recognize,  of  course,  that  the  formal  manipu¬ 
lations  that  produced  eqs.  (16)-(18)  involve  no 
approximation  of  the  full  many-elcctron  problem 
(eq.  (3))  whatever.  They  merely  project  the  in¬ 
finitely  large  problem  onto  a  finite  basis  {<ba} 
while  ‘folding  in'  all  contributions  from  the  com¬ 
plementary  subspace  {4>6)  into  the  operator 
Furthermore,  this  reformulation  does 
nothing  to  assist  the  solution  of  the  many-electron 
Schrodinger  equation,  for  the  computation  of 
is  every  bit  as  formidable  a  task  as  the 
original  problem.  It  can,  however,  suggest  a  useful 
avenue  for  approximation. 


‘  <*,!*.)  “  <«m-> 


(19) 


for  the  /th  eigenvalue.  Here  we  recognize  that 
such  is  the  tactic  of  LFT  if  we  take  as 
functions  built  from  pure  d(f)  orbitals  and  Jf  as 


I^lf  \*f) 


(20) 


However,  of  eq.  (18)  is  an  energy-dependent 
operator  so  that  the  identities  represented  by  eq 
(19)  are  different  for  each  eigensolution  (each  i), 
that  is  m  eq.  (19)  is  different  for  each  solution. 
By  contrast,  the  procedures  of  LFT  are  such  that 
one  implicitly  considers  one  and  the  same  effec¬ 
tive  operator  JfLF  throughout  the  manifold  of 
</-based  states  that  co-define  the  ‘ligand-field  reg¬ 
ime’  Were  it  otherwise,  one  would  not  exploit  a 
single  set  of  parameters  (matrix  elements  of  JfLh) 
throughout  the  regime  And  the  whole  point  of  the 
ligand-field  parametric  approach  is  to  account  for 
the  splittings  (and  associated  properties)  of  the 
manifold  of  d  (/)  states  simultaneously  with  one 
set  of  variables.  So  here  is  the  root  of  one’s 
surpnse  that  LFT  works.  That  it  does  indeed  work 
—  that  one  may  employ  some  mean  ligand-field 
Hamiltonian  and  thence  a  mean  parameter  set 
with  remarkably  consistent  efficacy  —  must  be 
attributed  to  Nature  providing  suitable  and  par¬ 
ticular  circumstances  Their  provision  is  not  within 
the  power  of  the  user. 

Rather  similar  circumstances  ensure  the  success 
of  7T  electron  theory  in  delocalized  organic  sys¬ 
tems.  There,  one  projects  the  many-electron  prob¬ 
lem  onto  a  subspace  of  w  functions  No  explicit 
reference  is  made  to  the  o  bonding  framework  or 
atomic  core  functions.  In  the  manner  of  eq  (18), 
these  are  folded  into  an  effective,  mean  Hamilto¬ 
nian.  Matrix  elements  of  that  mean  Hamiltonian 
are  parameterized  in  the  Huckel  model  by  the 
so-called  Coulomb  and  resonance  integrals,  «  and 
/?.  So  LFT  is  to  transition-metal  chemistry  what  tt 
electron  theory  is  to  delocalized  organic  systems. 
That  both  models  work  so  well  in  their  own  do¬ 
mains  is  to  be  ascribed  to  the  functions  of  their 


232 


Chemometrtcs  and  Intelligent  Laboratory  Systems  * 


Fig.  1  Radial  wavefunctions  for  (a)  Werner-type  complexes 
and  (b)  low-oxidatton-state  complexes,  of  the  first  transition 
senes. 


appropriate  subspaces  being  largely  uncoupled 
from  all  else. 

THE  CHEMICAL  SIGNIFICANCE  OF  THE  LF  EF¬ 
FICACY 

In  chemical  terms,  one  sees  that  natural  sep¬ 
aration’  of  the  d  basts  in  transtnon-metal  com¬ 
plexes  from  the  complementary  subspace  tn  terms 


of  an  effective  removal  of  the  d  functions  from 
the  valence  shell.  This  is  proposed  strictly  as  a 
‘zeroth  order’  viewpoint,  for  some  mixing  with  the 
d  orbital  takes  places,  as  evidenced  for  example  by 
the  (small)  breakdown  of  Laporte’s  rule  for  'd-d' 
intensities.  Furthermore,  this  separation  is  pro¬ 
posed  for  Werner-type  complexes  —  those  involv¬ 
ing  metals  in  higher  oxidation  states  and  which 
form  suitable  objects  for  ligand-field  study  —  but 
not  for  carbonyl  chemistry  or  low-oxidation  state 
complexes.  Radial  forms  of  3 d,  4s  and  4 p  func¬ 
tions  arc  sketched  in  Fig.  1  for  both  types  of 
complex.  The  view  we  take  here  of  the  Werner-type 
systems  is  that,  rather  like  the  way  the  4/  orbitals 
in  lanthamde(lll)  complexes  are  well  buried  and 
uninvolved  in  bonding,  the  rf-orbitals  are  rela¬ 
tively  ‘inner’  functions  that  overlap  very  poorly 
with  functions  offered  by  the  ligands  Chemically, 
this  view  accords  well  with  the  stability  of  open  d 
shells  in  these  systems  consider,  for  example,  the 
absence  of  free-radical  behaviour  of  unpaired  elec¬ 
trons  in  such  complexes  By  contrast,  the  much 
greater  mixing  between  if,  r  and  p  orbitals  in  the 
more  expanded  electron  clouds  of  very  low-oxida- 
tion  state  complexes  define  a  valence  shell  with  all 


ligand  field  shift 


W 


Fir.  2  View  of  the  bonding  m  highcr-oxidation-state  iransmon-meiai  coropiow*  r  "  h  .  u 

formation  between  metal  and  complete  group  of  ligands  (b)  secondary  pertu.bat, on  of  the  mean  4  orbitals  by  bond  orbttals 
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nine  metal  orbitals  and  chemistry  dominated  by 
the  18-electron  rule. 

We  therefore  commend  a  view  (9)  of  electron 
interactions  in  Werner-type  complexes  as  involv¬ 
ing  two  notional  steps  In  the  primary  step,  firm 
bonds  between  metal  and  ligands  are  formed  by 
overlap  of  metal  s  and/or  p  orbitals  together 
with  appropriate  ligand  functions.  The  second,  or 
smaller,  perturbation  is  the  interaction  of  the  d 
shell  with  the  bonding  functions  so  formed,  as 
sketched  m  Fig,  2  LFT  and  the  experimental 
properties  it  addresses  are  to  be  seen  as  part  of 
this  second  step.  Of  course,  the  very  interactive 
nature  of  this  process  means  that  while  d  orbitals 
energies  and  d  electron  distributions  are  affected 
by  the  bonding  electrons,  the  bond  orbitals  are 
affected  by  the  d  electrons  Fig.  2  is  to  be  seen  as 
the  end  product  of  such  a  cyclic  process  In  this 
way,  the  exigencies  of  the  electroneutrahty  princi¬ 
ple,  for  example,  will  have  been  satisfied  and 
thence  probed  or  reflected  by  the  effects  upon  the 
d  orbitals  that  we  analyse  by  LFT. 

ONE-ELECTRON  LIGAND-FIELD  PARAMETERS 

Parameters  of  the  effective  ligand-field  poten¬ 
tial  are  one-electron  integrals.  In  order  to  gauge 
their  chemical  significance  we  review  an  attempt 
to  forge  a  link  between  one-elcctron  theory  and 
the  many-electron  formalisms  above. 

One-electron  theory  begins  with  the  selection  of 
a  basis  The  total  freedom  available  in  making  this 
choice  is  not  limited  to  the  technical  question  of 
preferring  hydrogeme  functions  to  Slater-type 
orbitals  (STOs)  or  to  Gaussians  but  includes  the 
extent  to  which  exchange  and  correlation  effects 
are  included  at  the  outset.  The  basis  functions  arc 
defined  as  eigenfunctions  of  the  one-electron 
Hamiltonian, 

u  (21) 

where  T  is  the  usual  kinetic  energy  Laplacian  and 
V  is  some  form  of  potential  energy  operator.  In 
MO  calculations,  various  forms  of  U  have  been 
adopted:  in  early  Hartree  computations  U  ex¬ 
cluded  all  reference  to  exchange  and  correlation; 
in  Hartrcc-Fock,  a  particular  scheme  for  inclu¬ 
sion  of  exchange  is  included,  in  A’,,  calculations,  a 


quite  different  approach  defines  a  basis  which 
includes  some  account  of  both  exchange  and  cor¬ 
relation  effects.  Subsequent  computation  of 
many-electron  molecular  properties  m  terms  of 
the  various  orbital  bases  require  varying  —  and 
usually  extremely  extensive  —  ‘corrections’  to 
provide  an  acceptable  account  of  all  exchange  and 
correlation. 

In  one  sense,  however,  there  exists  a  ‘best’ 
choice  of  orbital  basis  which,  apart  from  trivial 
unitary  transformations,  is  unique  That  such  a 
choice  exists  is  established  by  density  functional 
theory  [10,11],  the  central  theorem  of  which  shows 
that  there  exists  a  set  of  orbitals  {£}  for  the 
system  ground  state  Irom  which  one  may  compute 
the  exact  total  electron  density  simple  by  forming 
the  sum  £<£*£,  over  populated  orbitals,  no  further 
‘corrections’  are  required.  Unfortunately,  the  theo¬ 
rem  provides  no  practical  help  in  calculating  what 
these  ‘best  orbitals’  are,  so  the  many-electron 
problem  remains  as  difficult  as  ever.  However, 
their  existence  provides  the  basis  of  a  structural 
analysis  of  a  model  like  LFT. 

Let  us  suppose  we  have  the  form  of  the  poten¬ 
tial  energy  operator  in  eq  (21)  that  leads  to  the 
‘best  orbitals’  for  the  system,  it  takes  the  form  of 
a  functional  of  the  total  electron  density  p: 

U~U(p)  (22) 

and,  for  the  ground  state  at  least,  the  one-electron 
Hamiltonian  (eq.  (21))  defines  the  solution  to  the 
given  problem  entirely.  Now  we  must  recall  that 
the  ligand-field  procedures  and  eq.  (2)  explicitly 
separate  d-d  interactions  from  all  others  In 
mimicing  this  artificial  but  established  structure  of 
LFT,  we  define  a  new  potential  energy  operator  V 
as  a  functional  of  the  total  electron  density  minus 
that  of  the  d  electrons* 

U(p  -  Pj)  (23) 

That  d  electron  density  remains  to  be  defined, 
cyclically,  in  a  moment.  We  thus  construct  an 
orbital  basis  of  ligand-field  orbitals  (LFO)  as  no¬ 
tional  solutions  to  the  one-election  Hamiltonian, 

T+  V  (24) 

The  LFO  is  then  expressed  as  a  linear  combina¬ 
tion  of  fragment  orbitals,  rather  as  molecular 
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orbitals  may  be  expanded  in  the  linear  combina¬ 
tion  of  atomic  orbitals  (LCAO)  system  However, 
the  fragment  orbitals  are  chosen  here  in  a  differ¬ 
ent  way  We  divide  up  V  into  spherical  and 
asphencal  parts,  (K)  and  V\  respectively. 

V~(V)+V'  (25) 

Then,  solutions  of  the  mean  one-electron  Hamilto¬ 
nian, 

jr«V  =  (r+<K»4.  =  «»  (26) 

take  the  usual  central-field  form, 

*-*<»)£(*,  9)  (27) 

so  spanning  a  series  of  functions  we  may  label  as 
s,  p ,  d ,  / ....  We  select  the  d  function  of  the 
mean  Hamiltonian  J^0)  —  which  we  henceforth 
call  the  mean  d  orbitals  of  the  system  —  as  one 
part  of  the  fragment  orbitals  of  the  LFO,  which 
latter  are  exact  solutions  of  the  hamiltonian  of 
eq.  (24).  So 

*Lro  +  *  (28) 

where  <>,  represents  all  other  functions  required  to 
span  the  rest  of  as  well  as  V\  the 

asphencal  part  of  It  is  the  electron  density  m 
these  ($,}  that  is  subtracted  m  the  definition  of  V 
in  eq  (23)  Though  notional,  the  procedures  so  far 
are  exact.  However,  to  make  contact  with  the 
reality  of  LFT,  we  must  now  approximate  and 
presume  that  the  ‘best  orbitals’  for  all  excited 
ligand -field  states  (but  not  for  others)  are  some¬ 
what  simlar  to  each  other  and  to  those  of  the 
ground  state  in  short  that  the  ‘mean  d  orbitals’ 
arc  also  a  mean  throughout  the  ligand-field  reg¬ 
ime.  Insofar  as  this  assumption  is  satisfactory, 
LFT  should  ‘work’;  insofar  as  LFT  works,  the 
assumption  may  be  deemed  to  be  satisfactory.  At 
this  stage,  notice  that  the  precise  radial  form  of 
the  mean  d  orbitals  (or,  of  course,  the  mean  / 
orbitals  if  one  is  dealing  with  a  lanthanide  prob¬ 
lem),  though  unknown  to  us  in  practice,  is  de¬ 
termined  by  and  for  the  system  in  question  In  this 
connection  recall  that  the  radial  part  of  the 
ligand-field  d  basis  is  left  implicit  in  ligand-field 
procedures. 

In  principle  we  now  have  the  basis  for  interpre¬ 


ting  one-electron  ligand-field  parameters  through 
the  relationship 

(^l^LF  1^/)  =  (^LFol^l^LFo)  (29) 

However,  little  chemical  transparency  would  de¬ 
rive  from  a  study  of  this  relationship,  for  the 
LFOs  refer  to  the  molecule  as  a  whole  At  this 
point,  one  recognizes  that  one  of  the  most  power¬ 
ful  ideas  throughout  chemistry  is  the  notion  of  the 
functional  group.  The  power  of  modern  ligand- 
field  analysis  is  only  realized  when  this  notion  is 
blended  with  the  theoretical  structure  we  have 
outlined  above:  this  blend  defines  so-called  cellu¬ 
lar  ligand-field  (CLF)  theory  (5,6). 

In  the  CLF  model,  we  consider  the  space  around 
the  metal  as  divided  up  into  N  contiguous  volumes 
or  ‘cells’.  In  genera!  —  though  there  is  an  im¬ 
portant  exception  we  have  no  space  to  discuss 
here  —  we  arrange  the«e  cells  so  as  to  enclose  one 
M-L  ligation  each  We  then  consider  the  total 
molecular  effective  ligand-field  potential  as  a  sim¬ 
ple  sum  of  all  cellular  potentials  Part  of  that 
supposition  is  the  idea  that  the  sources  of  the 
effective  potential  m  any  one  cell  are  physically 
located  in  that  cell  Such  is  not  the  case  in  CFT, 
for  the  potential  of  any  point  charge  is  sensed  in 
all  regions  of  space.  Here  we  presume  that  dielec¬ 
tric  screening  by  all  electrons  in  the  bonds  and 
cores  is  such  as  to  render  effective  ligand-field 
potentials  spatially  local.  Consider  then  the  effects 
of  this  local  effective  potential  upon  the  metal 
mean  d  orbitals  in  a  given  cell 

After  some  simple  algebra  [6.7],  which  we  do 
not  review  here,  analysis  of  the  relationship  (29) 
within  a  single  cell  yields  an  expression  for  the 
energy  shift  of  orbital  dx  as. 

*x~<dx\oe\dx) 

IXAXxxl-y"1  l<4)  (30) 

These  orbital  energies,  { ex },  are  the  parameters  of 
the  CLF  model.  Here  we  write  d  for  the  «f>j  of  eq 
(29)  and  x  for  functions  built  from  the  ‘rest’ 
functions  <>,  of  eq.  (28).  AH  functions  are  referred 
to  the  local,  cellular  frame  and  transform  with 
symmetry  \  with  respect  to  the  local  pseudosym¬ 
metry.  €j  is  the  energy  of  the  mean  d  orbitals  and 
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<Xx  the  mean,  or  expectation  value  energy  of  Xx* 
The  first  term  in  eq.  (30)  is  called  the  ‘static’ 
contnbution  and  the  second,  the  ‘dynamic’  contri¬ 
bution  It  is  sufficient  for  the  present  illustration 
to  focus  on  an  M-L  ligation  with  local  pseu- 
dosymmetry.  lower  local  ligation  symmetries  have 
been  studied  in  detail  and  reviewed  (12).  In  C*c 
symmetry,  A“o,  «rx  or  ny:  8  interactions  are 
neglected.  It  has  been  shown  that  for  X  «  a,  the 
static  contribution  is  likely  to  be  several  times 
smaller  than  the  dynamic  and,  for  A  =  w,  that  the 
static  contribution  should  be  negligible.  Our  dis¬ 
cussion  focuses,  then,  upon  just  the  dynamic  part 
of  eq  (30).  Both  total,  J?  and  aspherical,  ^(1\ 
parts  of  the  Hamiltonian  within  the  given  cell 
transform  totally  symmetrically  and  so  ensure  the 
identical  symmetry  speciation  of  dx  and  xx  in  C<1 
(30)  In  C2 p  symmetry,  therefore,  a  d9  orbital 
interacts  with  x«  orbitals  exclusively,  d„x  with 
X„x.  and  d9>  with  x*y  as  represented  in  Fig  3.  In 
short,  the  local  cellular  potential  matrix  is  diago¬ 
nal: 


‘h 

d„  I 


</.,  |  0 


dwl 

0  0  \ 


(31) 


with  the  local  cellular  parameters, 
^“(rfxIOLFK);  A  =  0,rrx,  v,  (32) 


where  vcLF  is  the  effective  ligand-field  potential  in 
cell  c.  Taking  eq  (31)  together  with  eq  (30)  and 
remarks  above,  we  have 


K<4  I-*’0’  IXx) I2 
«x~L - - 

€vX 


(33) 


and 


,  IXx><Xx 


(34) 


Observe,  in  passing,  how  the  effective  ligand-field 
operator  is  energy  dependent  but  that  this  is  ex¬ 
plicitly  built  into  the  ultimate  parameterization. 
Further  energy  dependence,  which  is  ignored,  is 
implicit  within  the  -  sign  and  in  the  concept  of 
mean  d  orbitals. 

Now  one  can  invoke  the  simple  chemical  rea¬ 
soning  to  simplify  these  sums  for  the  purpose  of 
interpretation.  Thus,  we  observe  that  the  domi¬ 
nant  contributions  to  ex  in  eq.  (33)  will  be  those 
with  larger  numerators  and  smaller  denominators. 
Jf<l)  is  the  aspherical  part  of  the  Hamiltonian 
(potential)  in  that  cell  and  so  maximizes  away 
from  the  metal  core.  Furthermore,  it  relates  to  the 


(a) 


<1.  j 

\ 

i 

1 

\  local 

\ — - 

'  X. 

for  ligand  a  donors  for  ligand  x  donors  for  ligand  x  acceptors 

Fig.  3.  Second  step  of  Fig.  2  within  the  local  CLF  scheme,  (a)  for  c  bonding,  tb)  for  vx  bonding  m  the  plane  normal  to  the 
paper,  is  similar). 
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electron  density  of  the  complementary  set  (the 
‘rest*)  and  so  to  ail  occupied  ‘rest’  orbitals. 
Numerators  in  eq.  (33)  will  therefore  be  largest 
when  x\  maximizes  near  these  regions.  De¬ 
nominators  will  be  smallest  for  Xx  closest  in  en¬ 
ergy  to  the  mean  of  orbitals.  All  in  all.  we  expect 
ex  to  be  dominated  by  those  x  which  are  most 
proximate  to  the  d  orbitals  in  both  space  and 
energy,  that  is,  by  the  bond  orbitals.  We  conclude 
that  the  sources  of  effective  ligand-field  potential 
are  the  bonding  electrons  and,  in  this  sense,  assert 
that  LFT  and  observable  ligand  properties  probe 
the  underlying  chemical  bonds 

It  is  worth  emphasizing  the  mam  points  and 
cyclic  nature  of  the  arguments  summarized  in  this 
article  Both  the  many-  and  one-electron  construc¬ 
tions  refer  to  the  projection  of  the  full  many-elec- 
tron  problem  onto  a  d  basis.  In  principle,  a 
complete  description  of  all  exchange  and  correla¬ 
tion  effects  are  built  (‘folded’)  into  the  structure 
though  in  practice,  of  course,  averaging  is  implicit 
within  the  process,  manifested  first  within  the 
mean  d  orbitals  basis  and  secondly  within  the 
interpretation  of  the  e  parameters  as  being 
dominated  by  one  or  two  bond  functions.  Subse¬ 
quent  rationalizations  relating  cmpincal  e  param¬ 
eters  to  bond  polarization  or  shape,  atomic 
polarizabilities  or  whatever,  are  qualitative  and 
must  be  judged  by  the  insight  the j  bring  to  the 
enterprise.  The  schemes  discussed  above  have 
never  been  offered  as  routes  for  quantitative  ab 
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Fig.  4.  Relationships  between  computational  methods. 


initio  computation  of  ligand-field  properties, 
though  they  could  be  They  have  the  virtue,  how¬ 
ever,  of  making  formal  connections  between  the 
phenomenological  ligand-field  procedures  of  eq. 
(2)  and  accepted  quantum  mechanical  principles 
and  of  so  providing,  via  eq  (33),  a  defensible  basis 
for  parameter  interpretation.  The  whole  structure, 
is  of  course,  predicated  on  the  assertion  that  the 
ligand-field  method  ‘works’.  One  further  aspect  of 
the  cyclic  nature  of  our  exposition  is  that  part  of 
the  justification  for  that  assertion  is  provided  by 
the  chemical  consistency  of  the  interpretations 
that  have  emerged  from  scores  of  CLF  analyses 


Tilt  PLACE  OF  LFT  IN  COMPUTATIONAL  CHEM¬ 
ISTRY 

LFT  does  not  have  the  purpose  of  providing  a 
model  for  the  computation  of  molecular  proper¬ 
ties  in  general  Its  domain  is  restricted  to  the 
spectroscopic  and  magnetic  electronic  properties 
of  open  d  or  /  shells  in  transition-metal  com¬ 
plexes  of  the  Werner  type  Furthermore  it  is  para¬ 
metric  Nevertheless,  its  underlying  structure  is 
such  as  to  separate  d  (/)  electron  properties  from 
all  else  and  so  to  probe  the  chemical  bonding  that 
surely  should  be  its  central  object  By  being  ex¬ 
cused  the  tasks  of  bonding  theory  it  leaves  to 
Nature  the  formidable  tasks  of  accounting  for  the 
exchange  and  correlation  effects  that  are  so  vexa¬ 
tious  for  computational  chemistry  at  large.  Bonds 
are  formed,  the  electroncutrahty  principle  is  satis¬ 
fied,  the  cut  and  thrust  of  balancing  electron  dis¬ 
tribution  is  enacted;  and  LFT  probes  the  end 
result.  That  is  why  LFT  is  so  effective  m  reproduc¬ 
ing  experiment  —  far  more  so  than  even  the  best 
ab  initio  computational  techniques  —  but  only 
within  its  proper  domain. 

»a  Fig.  4  a  tree-like  scheme  is  represented  (13) 
showing  the  relationship  of  one  computational 
method  with  another:  it  is  not  intended  to  be 
comprehensive.  It  shows  for  example  how  conven¬ 
tional  MO  schemes  do  not  map  onto  LFT  and 
how  the  angular  overlap  model  (a  precursor  to  the 
CLF),  being  an  MO  scheme  at  root,  is  of  a  quite 
different  ilk  to  that  of  the  CLF. 
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In  these  remarks  I  will  attempt  to  place  a 
perspective  on  the  validity  and  the  domain  of 
applicability  of  the  ligand  field  theory  that  Dr 
Gerloch  has  discussed. 

It  is  easy  for  a  working  chemist  to  be  drawn 
ashore  by  computational  sirens,  since  many  theo¬ 
retical  computational  methods  are  so  attractive 
from  a  distance  and  so  easily  misinterpretable  as 
offering  methodology  with  a  hint  of  permanence. 
When  stoichiometry  was  new,  chemistry  had  its 
first  ‘reducc-the-cntircty-of-chemistry-to-compu- 
tation*  tool  With  the  discovery  of  quantum  mech¬ 
anics,  the  goal  of  computing  molecular  properties 
from  first  principles  was  conceptually  achieved. 
Putting  this  result  into  practice  has  turned  out  to 
be  a  formidable  task,  and  is  today  a  major  area  of 
on-going  chemical  research.  And  once  it  has  come 
to  fruition,  it  will  face  the  equally  challenging 
requirement  of  reducing  the  complex  molecular 
orbital  descriptions  to  results  in  a  paradigm  useful 
to  the  working  chemist. 

Ligand  field  theory,  as  we  know  it  today,  is  a 
conceptual  development  purely  within  the  realm 
of  transition-metal  chemistry.  It  docs  not  belong 
to,  nor  is  it  derived  from,  the  molecular  orbital 
theory  Like  the  molecular  mechanics  used  in 
organic  chemistry,  today’s  ligand  field  theory  is 
based  on  concepts  derived  from  a  large  body  of 
knowledge  within  its  own  chemical  domain. 

Although  one  may  feel  a  loss  of  satisfaction  at 
first,  in  using  a  bonding  theory  not  derivable  from 
physical  cosmology,  the  benefits  of  using  the  ligand 
field  theory  are  immediately  obvious  and  allow  a 


fuller  appreciation  of  the  purpose  of  chemical 
theory  Ligand  field  theory  is  rigorously  valid 
within  its  domain  The  results  are  directly  perti¬ 
nent  to  bonding  And  perhaps  most  importantly, 
the  theory  can  be  used  by  a  chemist  ‘in  the  lab’ 
Now,  just  what  are  the  results  that  one  obtains? 
The  cellular  ligand  field  theory  is  used  to  describe 
bonding  in  mononuclear  transition-metal  com¬ 
plexes.  The  parameters  describing  bonding  be¬ 
tween  each  ligand  and  the  central  metal,  are  vari¬ 
able,  they  are  adjusted  to  produce  the  best  agree¬ 
ment  between  the  observed  properties  of  the  com¬ 
plex,  and  those  calculated  from  the  theory  When 
a  computation  is  finished,  the  user  has  a  set  of 
parameters  describing  the  strengths  of  the  various 
bonding  interactions  between  ligand  and  metal 
Each  parameter  represents  a  particular  component 
of  a  particular  bond  —  for  example,  there  will  be 
separate  parameters  for  the  sigma  and  pi  bonds 
between  each  ligand  and  the  central  metal  (And 
the  pi  interaction  can  be  further  divided  by  direc¬ 
tion,  if  this  is  appropriate ) 

The  immediate  utility  of  such  a  scheme  is  clear 
It  is  indeed  convenient  to  compare  one  complex  to 
another  in  terms  of  local  bonding  interactions 
Most  importantly,  it  is  possible  within  this  regime 
to  speak  of  computational  results  directly  in  terms 
of  bonding  properties.  And  there  is  a  lagmappc. 
This  sort  of  calculation  is  efficient. 

I  want  to  touch  on  the  limitations  of  ligand 
field  theory.  I  think  that  the  cellular  ligand  field 
theory,  although  rather  mature  in  its  treatment  of 
the  first  transition  senes,  can  still  benefit  from 
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further  exploration  of  the  second  and  third  rows 
of  the  {/-block,  and  from  a  treatment  of  the  /-series 
There  seems  still  to  be  un-tapped  potential,  both 
for  development  of  the  theory  and  for  better  un¬ 
derstanding  of  complexes  of  tbe  heavier  elements. 
It  is  difficult  to  say  —  even  to  speculate  — 
whether  the  fundamental  concepts  underlying 
ligand  field  theory  might  usefully  be  applied  to 
non-Wernerian  inorganic  chemistry  It  is  ap¬ 
propriate  to  add  at  this  point  that  the  numerical 
algorithms  used  m  these  calculations  are  both 
mature  and  robust,  and  should  not  need  major 


development  unless  the  theory  itself  or  its  realm  of 
applicability  changes  significantly. 

When  one  considers  the  panorama  of  computa¬ 
tional  chemistry  today,  it  is  clear  that  the  variety 
of  the  types  of  calculation  provides  one  of  the 
field’s  richest  properties  The  theory  that  Dr. 
Gerloch  has  described  is  among  those  modem 
theones  that  provide  useful  bonding  information 
to  chemists  Inorganic  chemistry  would  be  poorer 
without  it  —  and,  I  believe,  richer  with  further 
development  of  it 
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The  algorithm  that  Dr  Prince  has  described 
once  again  opens  the  possibility  that  large-mole¬ 
cule  structure  determinations  will  one  day  be  done 
with  something  approaching  the  facility  now  en¬ 
joyed  only  by  small-molecule  diffractionists 
It  has  been  true  until  quite  recently  that  the 
major  practical  and  theoretical  advances  in  the 
science  behind  crystallography  have  been  applied 
easily  and  naturally  to  the  purpose  of  facilitating 
small  structure  determinations,  while  macromolcc- 
ular  crystallography  has  received  less  benefit.  The 
means  of  solving  structures  via  Patterson  synthesis 
(I,2|,  the  discovery  and  development  of  the  direct 
methods  (3,4),  and  the  invention  of  the  four-circle 
diffractometer  (5)  have  all  had  far  greater  facilitat¬ 
ing  influences  on  small  molecule  science  than  on 
large.  The  maximum-entropy  methods  may  come 
to  be  an  important  facilitating  influence  in  macro- 
molecular  work. 

In  putting  a  context  around  the  maximum  en¬ 
tropy  method  as  a  phasing  tool,  it  is  worthwhile  to 
examine  the  phasing  tools  used  in  small-molecule 
work,  as  they  would  be  viewed  in  importance  by  a 
practitioner  in  the  field.  (Professor  Hauptman  has 
described  the  solution  of  the  theoretical  problem 
of  determining  phases  from  a  set  of  amplitudes 
(6).  It  is  interesting  to  see  that  practical  and  theo¬ 
retical  advances  can  follow  different,  though  re¬ 
lated,  courses.)  Before  the  advent  of  the  direct 
methods,  one  could  attempt  to  determine  phases 
by  model  building,  or  by  application  of  the  Patter¬ 


son  function,  a  self-convolution  of  the  structure 
which  can  be  calculated  in  a  phascless  transforma¬ 
tion.  These  methods,  as  viewed  by  today’s  practi¬ 
tioner,  rely  on  one  of  more  of  the  following  (1)  a 
non-uniform  distribution  of  electron  density;  (2) 
the  presence  of  useful  symmetry  elements,  and  (3) 
a  prion  chemical  knowledge  of  the  contents  of  the 
asymmetric  unit.  In  practice,  these  methods  often 
depend  on  the  skill  and  experience  of  the  practi¬ 
tioner. 

The  phase  problem  was  solved  in  principle  (for 
large  and  small  systems)  with  the  discovery  of  the 
Hauptman-Karle  determinants,  the  non-ncgniiv- 
ity  of  which  is  a  necessary  consequence  of  the 
non-negativity  and  atomicity  of  electron  density 
withm  a  crystal.  Of  course,  solving  the  problem  in 
practice  was  another  matter.  The  determinants,  in 
their  most  general  form,  simply  were  computa¬ 
tionally  too  difficult  at  the  time  of  their  discovery 
to  yield  a  closed  form  numerical  solution  for  a 
given  crystal  structure.  They  did,  however,  yield 
the  means  for  achieving  a  practical  solution  to  the 
phase  problem. 

The  third  order  determinant,  D}  (eq.  1),  yields 
an  expression  oil  the  basis  of  which  certain  values 
of  the  combination  ( 9 _ s  +  'ft  +  <?*_*)  can  be  ruled 
out  it  the  amplitudes  arc  large  enough.  (In  the 
case  of  a  centrosymmetric  crystal  the  phases  <f  arc 
restricted  to  values  of  zero  and  tr,  and  the  theory 
develops  slightly  differently.)  However,  even  when 
the  three-phase  combination  cannot  be  de- 


0169.7439/91/S03  50  C  1991  -  Elsevier  Science  Publishers  B.V 


242 


Chcmomeincs  and  Intelligent  Laboratory  Systems  ■ 


termined  with  certainty,  one  can  still  apply  prob¬ 
ability  theory  to  establish  an  expected  distribution 
for  its  value  (7-10). 


1 

U-h 


Vh 

1 

vh.k 


Vk 

Vk-h 

1 


>0 


(1) 


The  application  of  probability  theory  thus  be¬ 
comes  an  important  area  of  work  in  the  phase 
problem.  The  result  of  prime  importance  for  prac¬ 
tical  application  was  the  tangent  formula  (eq  2), 
which  gives  an  indication  for  a  phase  of  a  reflec¬ 
tion  h  in  terms  of  the  phases  and  amplitudes  of 
other  reflections  which  can  participate  with  h  m 
third-order  Hauptman-Karle  determinants.  The 
tangent  formula  is  used  in  conjunction  with  its 
variance  [1 1],  from  which  inferences  are  drawn 
about  the  reliability  of  the  indicated  phase 

^  ElM  Ek-k  |cos(«, +  <.*_()  ^ 

* 


The  tangent  formula  served  as  the  launch  pad 
for  the  next  important  practical  developments  — 
the  multiple  tangent  method  [121  and  the  popular 
computer  program  (MULTAN)  employing  it  [13} 
This  was  the  development  which  finally  allowed  a 
rapid  growth  m  the  number  of  laboratories  con¬ 
ducting  crystal  structure  analyses,  and  the  con¬ 
comitant  growth  in  the  importance  of  crystallogra¬ 
phy  to  chemists  Further  refinements  in  methodol¬ 
ogy  and  more  efficient  algorithms  and  programs 
[14,15]  led  to  further  rustication  of  X-ray  structure 
determination,  as  the  esoteric  aspects  of  the  phase 
problem  became  buried  in  packaged  protocols. 

Meanwhile,  the  probability  theory  that  allowed 
the  direct  methods  to  stimulate  the  flowering  of 
small-molecule  diffraction  work,  proved  initially 
to  be  its  undoing  in  large-molecule  work,  since  the 
reliability  of  a  phase  indication  changes  inversely 
with  the  square  root  of  the  number  of  atoms  in  the 
cell.  So  macromolecular  diffractiomsts  have  not 
been  able  to  share  fully  in  the  practical  benefits  of 
the  solution  of  the  mathematical  phase  problem. 
Rather,  the  labor-intensive  multiple  isomorphous 
replacement  method  has  remained  a  workhorse  for 
protein  structure  determination. 


Now,  where  does  the  principle  of  maximum 
entropy  fit  m  with  all  of  this?  The  important 
conceptual  property  of  maximum  entropy  is  that, 
like  the  Hauptman-Karle  determinants,  it  is  con¬ 
sistent  with  the  analysis  of  data  arising  from  a 
non-negative  electron  density  distribution  [16,17] 
The  principle  of  maximum  entropy  has  also  had  a 
practical  problem  in  common  with  the  de- 
terminantal  equations  —  useful,  widely  applicable 
numerical  algorithms  for  diffraction  analysis  have 
not  appeared  as  obvious  consequences  of  theory. 
Thus,  the  use  of  the  dual  method  that  Dr  Prince 
has  described  here  and  elsewhere  [18]  is  a  practical 
development  which  has  merited  a  thorough  test. 
The  examples  we  have  seen  today  show  the  useful¬ 
ness  of  the  algorithm  While  the  clarification  of 
noisy  electron  density  maps  is  valuable  and  itself 
would  justify  full  exploration  of  the  method,  it  is 
in  the  a  priori  determination  of  phases  that  I 
believe  the  maximum  entropy  method  can  be  most 
profoundly  exploited. 

The  maximum  entropy  method  has  its  roots  m 
probability  theory,  as  Jaynes  has  explained  in 
detail  [19]  The  modern  developments  by  Shannon 
[20]  (for  information  theory)  and  Jaynes  represent 
the  climax  of  a  long  conceptual  development. 
While  closely  tied  to  probability  theory,  the  maxi¬ 
mum  entropy  method,  in  its  most  basic  notion, 
formalizes  prior  ignorance  of  a  system  and  allows 
experimental  data  as  constraints  It  docs  not  em¬ 
ploy  conditional  probability  distributions,  and  ap¬ 
parently  does  not  suffer  from  a  loss  of  efficacy 
with  increasing  size  of  the  problem.  Considering 
all  of  this  it  is  natural  to  regard  the  maximum 
entropy  method  as  a  logical  and  potentially 
powerful  extension  of  the  direct  methods  with 
promise  for  macromolecular  diffraction  studies 
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Abstract 


Carey,  W  P  and  Wangen,  L  E..  1991  Determining  chemical  characteristics  of  plutonium  solutions  using  visible  spectrometry  and 
multivariate  chemometric  methods  Chemometncs  and  Intelligent  Laboratory'  Systems ,  10  245-257 

Two  chemometric  analysis  approaches  for  rapidly  screening  samples  are  presented  The  first  method  is  for  determining  Pu(IH) 
and  nitnc  acid  concentrations  by  using  the  multivariate  calibration  technique  of  partial  least  squares  (PLS)  regression  Quantitation 
of  plutonium  using  its  visible  spectrum  is  straightforward,  however,  the  effects  of  nunc  acid  on  the  Pu(Hl)  absorption  spectra  arc 
subtle,  and  nitnc  acid  quantitation  from  the  absorbance  spectrum  is  more  difficult  In  this  study  PLS  regression  is  successfully 
applied  to  quantitate  both  plutonium  and  nitnc  acid  by  using  the  information  contained  in  the  absorption  spectra  of  appropnate 
solutions  Evaluation  of  the  calibration  models,  using  test  samples  that  span  the  range  of  the  calibration  concentrations,  gave 
predictions  consistent  with  the  standard  error  of  the  calibration  models. 

Secondly,  pattern  recognition  methods  are  used  to  investigate  the  effects  of  various  amounts  of  nitnc  acid,  fluoride,  or  oxalate  on 
visible  spectra  of  Pu(IV)  solutions.  The  methods  enable  qualitative  estimates  of  the  solution  composition,  which  can  potentially  be 
used  to  adjust  solution  properties  to  desired  specifications.  The  mam  pattern  recognition  methods  employed  are  nearest  neighbor 
classification  and  principal  components  analysis 


DETERMINATION  OF  Pu(III)  AND  NITRIC  ACID 

Plutonium  can  be  precipitated  from  nitnc  acid 
solutions  by  forming  an  insoluble  oxalate  salt  of 
Pu(III).  However,  the  concentrations  of  both  total 
nitric  acid  (CHNOj)  and  oxalic  acid  affect  the 
solubility  of  the  Pu(III)  oxalate  product  (1,2). 
Pu(III)  oxalate  solubility  is  at  a  minimum  between 
0.5  to  1.0  M  nitric  acid  and  with  a  0.05  to  0.1  M 
stoichiometric  excess  of  oxalic  acid.  At  these  con¬ 
centrations  the  solubility  of  Pu(III)  ranges  be¬ 


tween  2  and  20  mg/1.  At  higher  nitric  acid  con¬ 
centrations,  the  solubility  of  Pu((III)  increases;  for 
example  in  20  M  nitnc  acid,  the  Pu(III)  con¬ 
centration  increases  tenfold.  There  are  also  indica¬ 
tions  that  increasing  the  oxalic  acid  concentration 
above  0.2  M  will  lead  to  increased  solubility  of 
the  plutonium.  To  assist  m  optimizing  solution 
conditions  for  the  precipitation  reaction  of  Pu(III) 
oxalate,  it  would  be  beneficial  to  have  a  rapid 
analytical  method  for  estimating  the  concentra¬ 
tions  of  plutonium  and  nitric  acid. 
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In  this  study  we  evaluated  a  method  based  on 
partial  least  squares  (PLS)  regression  for  predic¬ 
ting  both  Pu(IH)  and  nitric  acid  concentrations 
using  the  visible  absorption  spectra  of  solutions 
containing  the  species  of  interest  Several  tech¬ 
niques  based  on  visible  absorption  spectroscopy 
have  been  developed  for  estimating  Pu(III),  and 
quantitation  is  fairly  straightforward  (3-6)  How¬ 
ever,  determination  of  the  nitric  acid  concentra¬ 
tion  from  the  visible  absorption  spectra  is  more 
difficult  because  of  the  subtle  effects  of  nitric  acid 
on  the  spectrum.  In  this  paper  we  demonstrate  the 
use  of  PLS  for  extracting  the  small  signal  of  the 
nitric  acid  effect  in  the  presence  of  a  much  larger 
signal  caused  by  the  Pu(III)  absorption.  This  in¬ 
formation  provides  a  measure  of  nitric  acid  con¬ 
centration  that  can  be  used  in  studying  the  pre¬ 
cipitation  reaction. 

The  fundamental  theory  and  applications  of 
PLS  have  been  investigated  by  several  researchers 
(7-11}  PLS  uses  a  large  part  or  all  of  the  spectral 
data  points  to  develop  linear  combinations  of  the 
spectral  absorbances  that  correlate  with  the  ana¬ 
lyte  concentration  vector.  The  PLS  regression  pro¬ 
cedure  is  based  on  an  algorithm  in  which  the 
scores  are  orthogonal.  This  method  is  similar  to 
principal  component  regression  in  that  the  spec¬ 
tral  response  matrix  is  factor  analyzed  into  or¬ 
thogonal  vectors  based  on  the  variance.  However, 
it  includes  information  from  the  analyte  con¬ 
centration  vector  in  the  matrix  decomposition  pro¬ 
cedures  The  model  built  by  the  PLS  algorithm 
between  the  spectral  and  concentration  variables 
during  calibration  is  different  for  each  analyte  in 
so  far  as  their  effects  on  the  spectra  are  different. 
Two  separate  PLS  models  were  developed,  one 
each  for  Pu(IIl)  and  nitric  acid  Using  the  models 
developed  during  calibration,  we  predicted  analyte 
concentrations  in  several  solutions  not  used  in 
calibration. 

Experimental 

All  chemicals  were  reagent  grade,  except  for  the 
plutonium  nitrate  stock  solutions.  Plutonium 
nitrate  stock  solutions  were  obtained  by  dissolving 
PuOj  in  CHNOj/HF,  followed  by  fluoride  re¬ 
moval  using  ion  exchange.  The  concentrations  of 


these  stock  solutions  were  determined  by  standard 
radiochemical  methods  based  on  gamma-ray  spec¬ 
troscopy  with  a  relative  standard  deviation  of  0.5% 
(12].  We  prepared  a  25-sample  calibration  set  and 
a  6-sample  test  set  by  performing  volumetric  dilu¬ 
tions  of  the  stock  solutions  and  adjusting  nitric 
acid  concentrations  These  solutions  were  pre¬ 
pared  to  cover  the  acid  range  Nitric  acid  con¬ 
centrations  were  determined  by  a  standard  ad¬ 
dition  method  (13). 

We  recorded  spectra  between  500  and  880  nm 
on  each  sample  using  a  0.2  cm  path  length  flow 
cell.  The  spectrometer  for  these  experiments  was 
an  LT  Industries  Quantum  1200.  This  instrument 
allows  for  the  remote  placement  of  sample  cell 
and  detector  in  an  isolated  glove  box,  with  a 
fiber-optic  bundle  transporting  the  light  between 
source,  sample,  and  detector.  The  resolution  ob¬ 
tained  with  this  instrument  is  on  the  order  of  1  nm 
with  the  scan  for  the  visible  region  requiring  200 
ms.  For  each  sample,  ten  200-ms  scans  were 
acquired  and  averaged. 

Data  analysis  was  performed  using  a  PLS  pro¬ 
gram  developed  at  the  University  of  Washington 
( 14}.  This  code  was  implemented  on  a  VAX  11-780. 

Results 

Visible  spectra  of  the  plutonium  species  appear 
m  Figs.  1  and  2.  Fig.  1  shows  the  sensitivity  of 
several  Pu(III)  absorption  bands  in  solutions  con¬ 
taining  2.0  to  29.9  g/1  of  Pu(III).  The  nitric  acid 
concentration  in  these  four  samples  was  ap¬ 
proximately  1.3  M.  In  high-precision  analytical 
measurements,  the  bands  at  565  and  601  nm  are 
commonly  used  to  quantitate  Pu(III)  after  adjust¬ 
ment  of  solution  conditions.  The  effect  of  varying 
nitric  acid  concentration  on  these  spectra  is  il¬ 
lustrated  in  Fig  2  where  Pu(III)  was  held  constant 
(6.0  g/1)  and  nitric  acid  was  varied  from  0.6  to  2.3 
A/.  This  effect  is  most  readily  observed  at  565  nm, 
where  the  absorption  peak  tends  to  narrow  or 
become  more  symmetrical  with  increasing  nitric 
acid  concentration,  and  between  750  and  825  nm, 
where  a  change  in  one  or  more  underlying  ab¬ 
sorbance  bands  causes  small  changes  in  the  spec¬ 
tra. 
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Using  the  25-sample  calibration,  separate  PLS 
models  were  built  for  Pu(III)  and  nitric  acid.  All 
variables  were  mean  centered  and  scaled  by  their 
standard  deviation  before  the  model  was  built. 
For  both  models  the  number  of  component  vec¬ 
tors  to  use  was  determined  by  cross  validation 
(alternating  one-sample-removed  method),  and  the 


final  models  included  all  25  samples.  Table  1 
shows  the  percentage  variance  explained  for  these 
calibration  samples  by  the  PLS  model  for  both 
Pu(Ill)  and  nitric  acid  and  the  spectra  The  first 
component  explains  94  35%  of  the  variance  in  the 
spectral  responses  Evidently  Pu(IlI)  changes  are 
the  cause  of  this  because  98  80%  of  its  variance  is 


500  0  550  0  600  0  650  0  700  0  750  0  800  0  8500 


Wavelength  (nm) 

Fig.  2  E/fcct  of  mine  acid  on  Fu(!l!)  absorbance  spectra  Nitnc  acid  vanes  from  06  to  2.3  At  wrlh  a  constant  60  g/l  Pu(lII) 
concentration 
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Actuol  Pu(lll)  Concentration  (g/l  ) 

Fig  3  Actual  Pu(III)  concentration  versus  predicted  Pu(III)  concentration  based  on  a  two-latent-vanable  PLS  model 


explained  by  this  component.  This  is  as  expected 
on  the  basis  of  Fig.  1.  Nitric  acid,  however,  has 
only  5  78%  of  its  variance  described  by  the  first 
PLS  component.  For  nitric  acid  more  of  the  nitric 
acid  variation  is  explained  by  components  that 
explain  lower  amounts  of  spectral  variance.  Be¬ 
cause  very  little  of  total  spectral  variance  is  used 


to  model  nitric  acid  molarity,  we  expect  poorer 
results. 

The  accuracy  of  a  multivariate  model  can  be 
visually  examined  by  plotting  the  actual  calibra¬ 
tion  concentrations  versus  the  predicted  values  for 
each  sample.  For  Pu(III)  the  25  sample  concentra¬ 
tions  are  plotted  versus  their  estimated  concentra- 


Actuol  MNO3  Concentrotion  (M) 

Fig.  4.  Actual  mine  acid  concentration  versus  predicted  nitnc  acid  concentration  from  a  six-latent- variable  PLS  model. 
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TABLE  1 


Variance  described  by  PLS  models  for  Pu(lH)  and  mtnc  acid 


Latent 

variable 

Spectral 

Response 

Each  Total 
<%)  (%) 

Pu(Ul) 

Each 

m 

Total 

<*> 

Nunc  acid 

Each  Total 
(*)  (*> 

1 

94  53 

94  53 

98  SO 

98  80 

2 

361 

9S  14 

1.16 

99  96 

1 

94  35 

94  35 

5.78 

578 

2 

1  SI 

9617 
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35  34 

3 

3  51 

9968 

117 

3651 

4 

014 

99  82 

2812 

64  63 

5 

005 

99  87 

1691 

81  55 

6 

002 

9990 

1152 

9307 

ttons  using  a  two-latent-vanable  model  shown  in 
Fig  3.  As  expected,  Pu(III)  is  well  modeled  with 
an  r 2  statistic  of  1.00  and  a  standard  error  of  0.20 
g/1  Fig  4  provides  a  similar  plot  of  measured 
versus  predicted  concentrations  for  nitric  acid 
using  a  six-latent-vanable  model.  In  this  case  the 
model  describes  the  overall  nunc  acid  effect  on 
the  spectra  but  with  a  greater  degree  of  error  than 
the  Pu(III)  model.  The  r 2  statistic  for  the  nitric 
acid  model  was  0  93  with  a  standard  error  of  0.18 
M. 

A  better  measure  of  the  validity  of  the  calibra¬ 
tion  models  is  to  examine  that  predictive  capabil¬ 
ity  using  samples  not  included  in  the  calibration 
sample  set.  To  validate  the  constructed  models,  we 
analyzed  a  test  set  containing  six  samples  with 
known  Pu(III)  and  nitric  acid  concentrations  in 
the  same  manner  as  the  calibration  set  samples 


These  samples  were  prepared  using  the  same  tech* 
niques  as  for  the  calibration  samples.  Table  2 
compares  the  resulting  predictions  with  known 
values.  The  calibration  model  is  validated  if  the 
predicted  values  of  unknowns  are  within  the 
standard  error  range  of  the  model,  which  is  a 
calculation  of  the  standard  deviation  of  the  model 
residuals.  For  example,  approximately  95%  of  fu¬ 
ture  samples  should  fall  within  twice  the  standard 
error  if  the  unknowns  come  from  the  same  popu¬ 
lation  as  the  standards  For  Pu(III)  with  a  stan¬ 
dard  error  of  0.20  g/1,  all  of  the  predictions  were 
within  two  standard  errors,  with  four  of  the  six 
predictions  within  one  standard  error.  For  nitric 
acid  all  predicted  values  are  within  the  two  stan¬ 
dard  error  limit  (0.18  M  CHN03)  estimated  by 
the  model,  and  half  of  these  samples  are  within 
one  standard  error.  The  estimated  standard  errors 
of  prediction  were  0.25  g/1  and  0  23  M  for  Pu(III) 
and  nitric  acid  respectively,  which  is  slightly 
greater  than  that  of  the  calibration  set  for  both 
analytes  Although  the  number  of  samples  was 
limited  in  both  calibration  and  test  sets,  there  was 
no  statistical  difference  between  the  standard  er¬ 
rors  based  on  an  F-test  comparison  The  results  of 
this  test  set  provide  confidence  that  both  the 
Pu(III)  and  nitric  acid  models  are  valid  over  the 
range  of  concentrations  normally  encountered  in 
the  plutonium  oxalate  precipitation  studies 
We  have  demonstrated  the  use  of  the  Pu(III)- 
nitric  acid  absorbance  spectra  coupled  with  PLS 
regression  for  the  determination  of  Pu(lII)  and 
nitric  acid  concentrations  over  the  analyte  ranges 
of  1.99  to  29.9  g/1  plutonium  and  0  44  M  and 


TABLE  2 


Prediction  results  for  lest  set  samples 


Sample 

Pu(HI)(g/l) 

Nitnc  acid  (.V) 

True 

Estimated 

Difference 

True 

Estimated 

Difference 

1 

1.99 

200 

001 

198 

196 

002 

2 

5.97 

5.99 

002 

1.15 

147 

0  32 

3 

299 

303 

0.4 

107 

092 

015 

4 

19.9 

197 

02 

2.13 

247 

0  34 

5 

467 

4  62 

005 

208 

197 

Oil 

6 

156 

152 

0.4 

094 

1  16 

022 

Standard  error  of  prediction 

025 

023 
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3.08  A/  nitric  acid  The  precision  of  these  predict¬ 
ions  is  suitable  for  studying  the  effects  of  oxalic 
acid  and  nitric  acid  concentrations  during  the 
precipitation  of  plutonium  oxalate.  Although 
greater  precision  could  be  obtained  using  other 
more  complex  methods,  the  information  gamed 
from  these  spectral  measurements  is  adequate  for 
real-time  analyses  The  coupling  of  multivariate 
regression  techniques  with  absorbance  spec¬ 
troscopy  provides  quantitation  of  both  Pu(III)  and 
nitric  acid  from  a  single,  easy-to-perform  spectral 
measurement,  thereby  simplifying  the  instrumen¬ 
tation  used  in  studying  the  precipitation  reaction. 


QUALITATIVE  DETERMINATION  OF  Pu(IV)  COMPLEX 
COMPOSITION 

The  Vis-NIR  absorption  spectra  of  Pu(IV)  in 
nitric  acid  have  several  intense  bands  (15)  The 
number,  position,  and  intensity  of  these  bands 
depend  on  the  total  mine  acid  (CHNOj)  molarity 
and  the  plutonium  oxidation  state  The  spectra 
may  also  be  influenced  by  the  presence  of  other 
cations  and  anions  Thus,  it  was  hypothesized  that 
Vis-NIR  absorption  spectroscopy  could  provide 
information  important  for  the  chemical  characteri¬ 
zation  of  acidic  plutonium  solutions  Such  infor¬ 
mation  could  be  used  to  chemically  adjust  such 
solutions  before  their  treatment  by  ion  exchange. 
This  study  was  designed  to  determine  the  effect  of 
fluoride  and  oxalate  on  the  chemistry  of  Pu(IV)~ 
nitric  acid  solutions  as  evidenced  by  changes  in 
their  spectra.  Fluoride  and  oxalate  complexes  of 
plutonium  do  not  adsorb  to  the  ion  exchange 
resins  being  used  in  this  study. 

The  research  questions  posed  were 

-  How  many  different  absorbing  species  are  pre¬ 
sent  in  the  plutonium  solutions  ranging  from  4 
A/  to  10  A/  CHNOj  in  the  presence  of  either 
fluoride  or  oxalate? 

*  What  spectral  changes  result  from  the  addition 
of  fluoride  or  oxalate  to  Pu(IV)-mtnc  acid 
solution? 

-  Can  the  distribution  ratios  (i?d s)  and  initial 
concentrations  of  nitric  acid,  plutonium,  fluo¬ 
ride,  and  oxalate  be  predicted  from  the  Vis- 
NIR  spectra  of  the  solutions? 


-  Can  we  develop  a  classification  procedure  using 
Vis-NIR  spectra  that  will  separate  good  solu¬ 
tions  from  bad  ones  with  respect  to  ion  ex¬ 
change  behavior  (as  defined  by  /?ds)9 

Experimental 

Solutions  and  spectroscopy 
The  data  sets  used  in  this  study  consisted  of 
spectra  collected  from  two  different  experiments, 
which  were  identical  except  for  the  substitution  of 
oxalate  for  fluoride  in  the  second  experiment.  The 
solutions  used  are  described  in  Table  3.  Nitric 
acid  molarities  ranged  from  4  0  to  10  0.  Fluoride 
and  oxalate  concentrations  ranged  from  8  37  X 
10 A/  to  3  35  x  10"" 2  M  plus  a  zero  value  For 
all  fluoride  and  oxalate  concentrations,  two  differ¬ 
ent  concentrations  of  Pu(IV),  8.37  X  10" 3  M  and 
4.18  X  10-2  A/,  were  used  The  spectra  from  solu¬ 
tions  containing  no  fluoride  or  oxalate  are  com¬ 
mon  to  both  data  sets. 

AH  the  spectra  were  recorded  after  sufficient 
time  for  the  solutions  to  equilibrate  with  a  Quan¬ 
tum  1200  Vis-NIR  spectrometer  from  LT  In¬ 
dustries  The  wavelength  region  recorded  was  from 
400  to  880  nm  in  0  4-nm  increments  The  solutions 
were  contacted  with  anion  exchange  resin  (40-70 
mesh  Lewatit  MP-500-FK)  after  their  spectra  were 
recorded.  The  Rd  values  were  calculated  by  using 
initial  and  final  plutonium  concentrations  for  the 
fluoride  data.  The  analyses  are  not  presented 
for  oxalate  data. 

Data  reduction,  analysis,  and  interpretation 
Preprocessing  the  spectral  data  consisted  of 
several  steps  that  were  not  always  performed,  de- 

TA8LE 3 

Composition  of  solutions  used  for  effect  of  fluonde  or  oxalate 
on  spectra  of  Pu(IV)-nitnc  acid  solutions  * 

Nitncacid  4  M.  5  A/,  6  A/.  7  St,  8  Af.  9  Af,  10  Af 
Plutonium  8  37xlO",A/,418xlO~2  Af 
Fluonde or  000,837x10  2  Af.  1  67X10'2  A/, 
oxalate  2  51xlO~2  A/.  3  35xl0~2  A/ 

*  At  each  combination  of  nitnc  acid  molanty  and  plutonium 
concentration,  solutions  containing  either  fluonde  or  oxalate  at 
the  indicated  concentrations  were  prepared 
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pending  on  the  particular  study  objectives.  To 
reduce  the  number  of  variables  that  the  computer 
programs  must  handle,  all  the  spectra  were  re¬ 
duced  from  1200  to  600  absorbance  values  per 
spectrum  by  performing  a  two-point  average  of 
successive  absorbance  values.  Occasional  baseline 
shifts  were  corrected  by  a  simple  baseline  subtrac¬ 
tion  method  For  each  spectrum  this  involved 
determining  the  minimum  absorbance  value,  akt 
in  that  spectrum;  computing  the  average  of  ak„t> 
ak,  and  and  subtracting  this  average  from 
every  absorbance  value  in  each  spectrum.  More 
sophisticated  meth^s  of  baseline  correction  for 
these  spectra  would  be  difficult  to  implement  be¬ 
cause  the  spectra  are  so  complex.  Absorbance 
values  approached  baseline  in  only  one  or  two 
spectral  intervals.  To  adjust  for  different  con¬ 
centrations  of  plutonium  m  different  data  sets,  we 
normalized  the  spectra  to  a  sum  of  1.0,  a£  = 
u*/(Sum  aA),  A'  =  l  to  600  for  each  spectrum. 
However,  this  normalization  is  not  done  when  the 
best  model  for  predicting  plutonium  concentra¬ 
tions  is  desired. 

Data  analysis  methods  consisted  mainly  of 
variations  of  the  mathematical-statistical  proce¬ 
dures  most  commonly  referred  to  as  principal 
components  analysis.  All  of  these  methods  involve 


decomposition  and  analysis  of  a  spectral  data 
matrix  whose  individual  rows  consist  of  the  Vis- 
NIR  spectrum  of  one  of  the  experimental  solu¬ 
tions  under  study.  The  specific  methods  used  were 
pattern  recognition  based  on  principal  compo¬ 
nents  modeling  (SIMCA)  [16],  pattern  recognition 
based  on  nearest  neighbor  classification  [17],  pat¬ 
tern  recognition  based  on  other  methods  con¬ 
tained  in  the  ADAPT  package  [18],  and  principal 
components  regression  (19,20] 

Results 

For  each  data  set,  there  are  70  spectra  corre¬ 
sponding  to  seven  CHNO*  molarities,  five  fluo¬ 
ride  or  oxalate  concentrations,  and  two  plutonium 
concentrations  (2x5x7  =  70).  Thus,  wc  have  a 
large  number  of  spectra  that  are  quite  complex 
and  that  vary  considerably  with  changing  con¬ 
centrations.  Fig.  5  demonstrates  this  complexity 
and  the  changes  caused  by  fluoride  at  8  M  CHNOj 
for  a  8.37  X  10" 3  A/  plutonium  solution.  The 
highest  fluoride  concentration  is  a  4 ;  1  fluonde- 
to-plutonium  molar  ratio.  The  peaks  with  0  0  M 
fluoride  at  420  and  850  nm  arc  absent  in  the 
high-fluoride  spectrum.  There  arc  numerous 
changes  in  relative  peak  heights.  The  band  at  475 


Wavelength  (nm) 


Fig.  5.  Spectra  of  8.37  x  10“J  St  Pu(lV)  with  fluoride. 
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nm  is  less  intense  in  the  high-fluoride  spectrum. 
Oxalate  docs  not  have  as  great  an  effect  on  the 
spectra  of  Pu(IV)-mtric  acid  solutions  as  docs 
fluoride  (Fig.  6). 

Number  of  absorbing  species 

Matrix  rank  determination  has  become  a  fairly 
common  procedure  in  spectroscopy  for  estimating 
the  number  of  absorbing  species  in  a  scries  of 
mixtures  (21].  This  procedure  is  valid  provided 
Beer's  model  is  obeyed,  that  is,  if  the  total  ab¬ 
sorbance  is  a  linear  summation  of  the  absorbances 
of  the  individual  species.  The  major  difficulty  with 
the  procedure  is  determining  the  chemically 
meaningful  rank.  Because  of  noise,  the  mathemati¬ 
cal  rank  will  usually  be  the  lesser  of  I  and  K  for  a 
data  set  composed  of  I  spectra.  The  i  ih  row  of  the 
matrix  contains  the  spectrum  for  the  i  th  mixture, 
and  K  is  the  number  of  wavelengths  at  which  there 
are  absorbance  values.  Various  methods  for  de¬ 
termining  the  number  of  absorbing  species  have 
been  proposed.  In  this  paper,  we  will  discuss  only 
the  method  based  on  cross  validation.  The  spectral 
data  matrix  used  for  this  analysis  consisted  of 
cither  the  fluoride  or  oxalate  spectral  data  set.  In 
each  case,  there  are  70  spectra  with  600  ab¬ 
sorbance  values,  i.e.,  I  by  K  -  70  by  600. 


Cross  validation 

The  cross  validation  for  principal  components 
analysis  contained  in  the  set  of  pattern  recognition 
computer  programs,  SIMCA,  was  used  for  the 
present  problem.  SIMCA’s  program  module 
CPR1N  was  used  with  the  cross  validation  option. 
In  cross  validation,  a  subset  of  the  data  is  ex¬ 
cluded  from  the  data  set.  Then  a  model  is  devel¬ 
oped,  and  the  excluded  data  values  are  estimated 
(predicted)  by  using  the  model.  The  sum  of  the 
squared  differences  between  each  true  value  and 
each  predicted  value  is  the  predicted  residual  error 
sum  of  squares  (PRESS).  Next,  the  excluded  data 
subset  is  returned  to  the  modeled  data  set,  and  a 
different  subset  of  the  data  is  excluded  Again,  a 
model  is  developed  and  used  to  predict  the  ex¬ 
cluded  subset.  This  process  continues  until  all 
data  have  been  excluded  and  predicted  one  time 
for  each  value  of  J  (number  of  components).  If, 
after  allowing  for  degrees  of  freedom,  PRESS 
continues  to  decrease  upon  addition  of  component 
7,  component  J  is  assumed  to  model  nonrandom 
variation  in  the  data.  However,  if  PRESS  for 
component  J  is  greater  than  PRESS  for  compo¬ 
nent  J-l  component  J  is  assumed  to  be  model¬ 
ing  only  random  noise  in  the  data.  In  this  case 
component  J  should  not  be  used,  and  we  assume 
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TABLE  4 


Cross  validation  results  for  determining  the  number  of  linear 
independent  components  m  the  fluoride  and  oxalate  spectral 
data  matrices 


7 

Fluonde 

Oxalate 

Vanancc 

explained 

PRESS* 
7/(7 -1) 

Variance 

explained 

PRESS* 

7/(7 -1) 

1 

7515 

050 

9059 

031 

2 

2175 

0  36 

667 

0.55 

3 

142 

075 

14S 

0  69 

4 

049 

086 

053 

078 

5 

042 

083 

0.30 

079 

6 

0  26 

086 

on 

0  88 

" 

014 

087 

005 

0.97 

8 

006 

099 

004 

096 

9 

005 

096 

C03  ** 

100 

10 

0  03** 

too 

002 

too 

*  For  7-1.  PRESS  for  /-I  is  based  on  the  variance  ex¬ 
plained  by  using  the  average  values. 

**  A  strict  interpretation  of  cross  validation  results  shows  that 
there  are  nine  and  eight  components  in  the  fluonde  and 
oxalate  data  sets 

there  arc  J  -  1  linearly  independent  components 
in  the  entire  data  set.  If  the  spectra  of  the  individ- 
ual  chemical  species  add  linearly,  i.c.  if  Beer’s 
model  is  obeyed,  this  number  is  the  same  as  that 
ol  absorbing  species  in  the  solutions  from  which 
the  spectra  were  obtained. 

The  data  variables  were  not  scaled.  Two  differ¬ 
ent  SIMCA  runs  were  made,  one  for  the  fluoride 
and  one  for  the  oxalate  data  set  with  each  spec¬ 
trum  normalized  to  a  sum  of  1.0.  The  results  of 
these  two  analyses  are  listed  in  Table  4  in  terms  of 
the  ratio  of  PRESS  for  J  components  to  the 
PRESS  for  J  -  1  components  The  variance  ex¬ 
plained  by  each  component  is  also  tabulated.  These 
PRESS  ratios  indicate  nine  components  for  the 
fluoride  spectra  and  eight  components  for  the 
oxalate  spectral  data  set.  In  the  absence  of  fluo¬ 
ride  or  oxalate,  studies  indicated  five  or  six  com¬ 
ponents.  Thus,  the  addition  of  fluonde  or  oxalate 
to  solutions  of  Pu(lV)-nitric  acid  (4  A/- 10  A/) 
add  about  three  or  four  observable  components. 

In  this  study  we  applied  SIMCA,  nearest 
neighbor,  Bayes  quadratic  classifier,  and  the  linear 
learning  machine  from  ADAPT  [18]  to  investigate 
their  usefulness  for  classifying  the  fluoride  or 
oxalate  Pu(IV)-nitric  acid  solutions.  For  the 


ADAPT  analysis,  the  Rd  values  were  used  to 
divide  the  fluoride  spectral  data  set  into  ‘good’ 
and  ‘bad’  categories  For  the  SIMCA  pattern  re¬ 
cognition  approach,  data  were  not  divided  into 
separate  categories  before  analysis  because  it  is 
possible  to  visually  see  the  separation  when  plot¬ 
ting  certain  of  the  sample  scores. 

ADAPT  results  on  fluoride  spectra 

The  classification  results  appear  in  Table  5.  The 
input  data  to  these  pattern  recognition  methods 
consisted  of  the  principal  component  scores  of  the 
spectra  rather  than  the  spectra  themselves  AH  the 
methods  were  able  to  separate  spectra  repre¬ 
senting  good  and  bad  R4  values  reasonably  well. 
The  linear  learning  machine  correctly  categorized 
all  70  spectra,  and  the  Bayes  quadratic  classifier 
only  missed  1  out  of  70.  The  nearest  neighbor 
results  vary  a  little  depending  on  the  number  of 
voting  neighbors.  Apparently  three,  five,  or  seven 
voting  neighbors  give  equivalent  results,  but  none 
are  as  good  as  the  Bayes  or  learning  machine 
methods. 

SIMCA  Plots  for  fluonde  and  oxalate  spectra 

We  developed  a  six-component  model  using 
SIMCA  and  the  principal  components  of  Vis-NIR 
spectra  obtained  from  39  solutions.  The  39  solu¬ 
tions  contained  only  nitric  acid  ranging  from  about 
1  M  to  14  A/  and  Pu(lV),  No  fluoride  or  oxalate 


TABLE  5 


Pattern  recognition  summary  results  for  fluonde  spectra  using 
the  Bayes  quadratic  classifier,  linear  teaming  machine,  and 
nearest  neighbor  algonthm  in  ADAPT 


Good  samples 
(High  R*) 

Bad  samples 
(Low  fld) 

No.  of 
neighbors 

No 

correct 

No 

incorrect 

No 

correct 

No 

incorrect 

Bayes 

26 

3 

43 

l 

Learning 

26 

0 

44 

0 

machine 

Nearest 

22 

4 

36 

8 

1 

neighbor  23 

3 

39 

5 

3 

24 

2 

38 

6 

5 

23 

3 

39 

5 

7 
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was  in  the  data  set.  (SIMCA  mean  centers  and  CHN03  molarity  from  the  top  left  to  the  middle 

autoscales  the  spectral  intensities  before  calculat-  right  of  the  graph  The  numbers  in  the  figure  with 

ing  the  principal  components.)  Here  we  use  the  an  appended  H  designate  total  nitric  acid  molarity 

first  two  principal  components  derived  by  this  CHNOj.  The  numbers  with  a  prefixed  T  were  all 

model  to  compare  how  the  spectra  of  fluoride  and  between  6  5  and  8.5  M  nitric  acid,  with  nitric  acid 

oxalate  data  plot  as  compared  with  Pu(IV)-nitric  molarity  increasing  from  left  to  right.  The  desira- 

acid.  ble  samples,  from  an  ion  exchange  perspective, 

The  scores  of  the  first  two  components  for  the  plot  at  the  bottom  of  the  figure  as  7H  Clearly, 

39  training  samples  are  plotted  in  Fig  7,  which  given  the  location  of  a  solution  containing  only 

shows  a  nice  semicircular  trend  of  increasing  Pu(lV)  and  nitric  acid  on  this  figure,  the  ap- 


•0.0187  0  0232 

Principal  Component  1 

fig.  7.  Hoi  of  first  two  principal  components  of  39  Pu(lV)-nitnc  acid  samples  mating  op  ihe  training  sei 
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proximate  quantity  of  acid  or  base  to  add  for 
adjusting  the  solution  chemistry  in  a  desired  direc¬ 
tion  could  be  specified. 

This  same  principal  components  model  was 
used  to  calculate  scores  for  all  samples  of  the 
fluoride  and  oxalate  data  sets.  The  scores  of  the 
first  two  principal  components  are  plotted  to¬ 
gether  with  those  of  some  of  the  training  samples 
in  Figs.  8  and  9  for  fluonde  and  oxalate  samples, 
respectively.  (Training  samples  are  in  bold  print.) 
Fig.  8  shows  the  fluonde  spectra  plot  in  the  plane 


above  the  semicircle  defined  by  the  training  set. 
Again,  the  numbers  refer  to  mtnc  acid  molarity 
and  the  Fs  to  fluoride  samples.  For  a  constant 
nitric  acid  molarity,  greater  fluoride- to-plutomum 
concentration  ratios  plot  higher  in  the  graph.  If 
aluminum  were  added  to  complex  the  fluoride  in 
an  unknown  solution  that  plotted  in  the  middle  of 
Fig.  8,  its  position  in  this  graph  would  move  down 
and  to  the  right.  Upon  arriving  at  the  semicircle 
representing  the  training  set,  a  base,  such  as 
sodium  hydroxide,  could  be  added  to  the  solution 


Principal  Component  1 

Pig.  8  Plot  of  first  two  principal  components  for  fluonde  samples  and  some  Pu(lV)-mtnc  acid  training  samples 
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I  ig  9  Plot  of  first  t»o  principal  components  for  oxalate  samples  and  some  Pu(IV)-nitnc  acid  training  samples 


until  the  measured  spectrum’s  principal  compo¬ 
nents  plotted  between  7  M  and  8  Si  CHNOj,  i.e., 
7H  and  8H.  In  this  way,  Vis-NIR  spectrometry 
could  be  used  for  real-time  adjustment  of  the 
solution  chemistry  to  arrive  at  a  system  desirable 
for  ion  exchange. 

The  first  two  principal  component  scores  of  the 
oxalate  spectra,  together  with  those  of  some  of  the 
training  samples,  are  plotted  in  Fig.  9.  In  contrast 
to  the  fluoride  samples,  these  samples  generally 
plot  on  or  below  the  semicircle  defined  by  the 
training  samples.  This  plot  verifies  our  earlier  ob¬ 
servation  that  oxalate  does  not  have  as  much 


effect  on  the  Pu(IV)-nitric  acid  spectra  as  docs 
fluoride  (Fig  6).  which  agrees  with  the  known 
chemistry  of  these  systems 


SUMMARY 

We  have  shown  that  Vis-NIR  spectra  of 
?u(IV)-mtric  acid  solutions  containing  either  flu¬ 
oride  or  oxalate  provide  information  concerning 
the  solution  chemistry  Pattern  recognition  meth¬ 
ods  based  on  the  spectra  can  be  used  to  determine 
chemical  character  of  the  solutions.  Plots  of  pnn- 
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cipal  component  scores  provide  information  about 
the  solution  chemistry  that  could  be  used  to  adjust 
solution  conditions  to  desired  states 
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This  paper  is  an  ideal  contribution  to  the  Sta¬ 
tistics  in  Chemistry  conference  held  in  College 
Station  The  authors  present  several  challenging 
problems  which  they  address  in  an  intelligent  fash¬ 
ion  using  PLS  regression  and  a  variety  of  pattern 
recognition  techniques. 

Their  most  successful  application  is  in  the  de¬ 
termination  of  Pu(III)  and  nitric  acid  using  PLS 
regression  The  results  on  lest  samples  given  in 
Table  2  provide  strong  evidence  that  the  authors 
can  predict  unknjwn  concentrations.  Perhaps  the 
authors  might  comment  on  any  operator  or  tech¬ 
nician  effects  Obviously,  they  are  adept  at  using 
the  LT  Industries  Quantum  1200  device.  In  routine 
operations  by  lesser  skilled  technicians,  would  the 
performance  be  so  good? 

The  questions  related  to  qualitative  determina¬ 
tion  of  Pu(IV)  complex  composition  are  clearly 
more  difficult  and  the  results  not  so  clear-cut.  I 
am  a  little  unclear  on  the  results  in  Figs.  5  and  6 


for  the  zero  fluoride  and  oxalate  concentration.  If 
there  were  five  concentrations  used,  where  is  the 
fifth  curved 

Figs.  7-9  are  cunous.  Many  times  statisticians 
neglect  the  very  useful  technique  of  designating 
points  on  plots  as  a  value-added  characteristic 
Fig  7  seems  (unfortunately)  to  set  a  standard  by 
which  we  view  Figs.  8  and  9.  The  scatter  in  Figs.  8 
and  9  is  much  more  than  in  Fig.  7  What  fraction 
of  the  variation  is  explained  by  the  first  two 
principal  components? 

One  final  comment  concerns  Table  5.  Although 
Bayes  and  learning  machine  dominate  nearest 
neighbor  procedures  here,  I  am  unwilling  as  yet  to 
dismiss  nearest  neighbor  (the  authors  do  not  sug¬ 
gest  this  but  a  reader  might  inadvertently  con¬ 
clude  as  much).  I  suspect  that  if  the  data  were  a 
tad  more  ‘noisy’,  nearest  neighbor  might  make  a 
comeback.  What  is  it  about  the  authors’  applica¬ 
tion  that  favors  Bayes  and  learning  machine? 
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Abstract 


Kim,  Y  -I  and  Nachtsheim,  CJ .  1991  Transformation  robust  experimental  design  with  application  to  some  problems  in  chemistry 
Chemometnes  and  Intelligent  Laboratory  Systems,  10  261-270 

In  this  paper  ssc  consider  the  selection  of  an  appropriate  experimental  design  when  the  exact  form  of  the  error  distribution  is 
unknown  The  goal  of  error-robust  design  is  to  design  an  expenment  so  that  the  ‘ill-effects'  resulting  from  a  lack  of  knowledge  of  the 
error  structure  will  be  minimal  Numerical  algorithms  for  computer  construction  of  error-robust  designs  are  developed  and  the 
method  is  illustrated  in  connection  with  the  design  of  experiments  for  nonlinear  modeling  of  chemical  reactions 


1  INTRODUCTION 

The  examination  of  standard  statistical  tech¬ 
niques  in  order  to  determine  their  sensitivity  to 
assumptions  and  development  of  new  techniques 
that  are  insensitive  to  assumptions  have  been  major 
areas  of  statistical  research  in  the  last  two  decades 
Experimental  design  is  an  area  in  which  it  is 
particularly  important  to  investigate  questions  of 
robustness  because  an  experimenter’s  assumptions 
about  the  experimental  process  are  critical  in  de¬ 
termining  the  design.  Furthermore,  the  design  must 
be  chosen  before  the  data  arc  collected  and  so 
cannot  be  discarded  if  the  data  indicate  that  the 
assumptions  are  seriously  violated.  Thus  it  is  im¬ 
portant  to  examine  experimental  designs  for  their 
sensitivity  to  assumptions. 


Generally,  we  observe  that  the  design  chosen 
will  explicitly  depend  on  the  experimenter’s 

(1)  design  criterion; 

(2)  definition  of  the  design  space; 

(3)  a  prion  specification  of  the  model. 

By  ‘model’  we  mean  the  distribution  of  a  response 
>’(*),  at  some  point  x  in  the  ^-dimensional  design 
space  x-  Unfortunately,  precise  a  prior  specifica¬ 
tion  of  points  (l)-(3)  is  often  difficult  in  practice. 
This  fact  has  led  statisticians  to  search  for  ways  of 
constructing  designs  where  one  or  more  of  the 
items  listed  cannot  be  so  explicitly  stated. 

For  example,  with  regard  to  (1)  above,  Box  |1] 
stressed  the  need  to  design  experiments  with  many, 
sometimes  conflicting,  goals  in  mind,  not  just  one 
implied  by  a  single  design  criterion.  Kiefer  [2) 
examined  the  robustness  of  optimal  designs  to 
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changes  m  entena.  Welch  (3]  presented  a  method 
for  cataloguing  designs  that  are  optimal  by  one 
criterion,  so  that  further  comparisons  among  these 
optimal  designs  could  be  made  on  the  basis  of 
other  criteria 

The  question  of  robustness  to  assumptions  con¬ 
cerning  the  true  model  tj  has  been  widely  studied 
Two  different,  but  complementary,  approaches 
have  been  taken.  The  first  approach  has  sought 
designs  that  will  yield  reasonable  results  for  the 
proposed  model  even  though  it  is  known  to  be 
inexact.  Steinberg  and  Hunter  (4)  call  these 
‘model-robust  designs’.  For  examples  of  work  m 
this  realm,  see  refs.  5-9.  The  second  approach  has 
focused  on  developing  designs  that  facilitate  im¬ 
provement  of  the  proposed  model  by  trying  to 
highlight  suspected  inadequacies.  Steinberg  and 
Hunter  call  these  designs  ‘model-sensitive  designs’. 
Examples  are  given  in  refs.  10-17,  among  others. 

Special  ‘model-robustness’  problems  arise  in 
the  design  of  experiments  for  nonlinear  models 
This  is  because  the  best  design  depends,  in  gen¬ 
eral,  on  the  unknown  parameter  values.  Investiga¬ 
tors  are  thus  placed  in  a  paradoxical  position  of 
having  to  known  at  design  stage  (at  least  ap¬ 
proximately)  the  very  quantities  that  they  are  con¬ 
ducting  the  experiment  to  estimate.  Little  has  been 
done  to  assess  the  robustness  of  nonlinear  designs 
to  misspecification  of  9.  (Chaloncr  and  Lamtz  [18] 
develop  a  Bayesian  approach  in  which  only  a  prior 
distribution  for  9  is  required.)  For  reviews  of 
nonlinear  designs,  see  refs.  19  and  20,  among 
others. 

A  final  area  of  robustness  concerns  the  sensitiv¬ 
ity  of  designs  to  the  specification  of  error  struc¬ 
ture.  The  occurrence  of  outliers  and  missing  ob¬ 
servations  represent  two  ways  in  which  these  as¬ 
sumptions  may  be  violated.  A  number  of  authors 
have  studied  design  in  such  circumstances.  See,  for 
example,  refs.  8  and  21-23  regarding  design  in  the 
presence  of  outliers.  Also  see  refs.  24  and  25 
concerning  design  when  missing  data  might  be  a 
problem.  Concerning  lack  of  independence  in  the 
error  terms,  see  refs.  26  and  27. 

Surprisingly  little  has  been  done,  however,  with 
regard  to  the  designs  that  are  robust  to  the  general 
misspecification  of  the  error  structure.  In  what 
follows,  we  consider  the  construction  of  such  de¬ 


signs  The  only  relevant  paper  on  this  issue  was 
found  to  be  ref.  28  They  applied  a  ‘power  trans¬ 
formation  weigthing’  technique  to  develop  sequen¬ 
tial  experimental  designs  for  precise  parameter 
estimation  of  the  model  and  transformation 
parameters  together. 

This  paper  has  the  following  structure.  We  first 
review  the  design  of  experiments  in  the  presence 
of  known,  non-constant  variance  m  Section  2.  In 
Section  3,  a  general  definition  of  error-robustness 
is  developed  and  a  number  of  examples  are  con¬ 
sidered.  Carroll  and  Ruppert  (29J  recently  advoc¬ 
ated  a  new  method  (power  transformation  on  botli 
sides— PTBS)  for  simultaneous  estimation  of  the 
regression  parameters  and  index  of  the  ‘best’ 
power  transformation,  X  We  show  in  Section  4 
that  designs  that  are  robust  (in  a  sense  to  be 
desenbed)  to  the  eventual  specification  of  X  are 
related  to  error-robust  designs  Two  important 
examples  from  the  literature  are  studied  in  Section 
5.  Some  closing  remarks  are  given  in  Section  6 


2  OPTIMAL  DESIGN  IN  THE  PRESENCE  OF  NON-CON¬ 
STANT  VARIANCE 


2,1  Notation 


In  what  follows,  we  assume  that  responses  are 
independent  having  mean  £(>’(*))  *  i)(x,  9)  and 
variance  Var  (>’(x))  =  o2(x,  X)  where  9  and  X 
are  unknown  parameter  vectors  of  dimensions  p 
and  q  respectively  We  use  the  term  error  function 
in  connection  with  o2(x,  X),  its  inverse, 
o-2(x,  X),  is  termed  the  efficiency  function,  where 
we  shall  assume  0  <  cr2(x,  X)  <  oo.  For  brevity 
we  will  often  the  abbreviated  form  o2(x) 

Consider  an  AT-pomt  experiment  m  which  n, 
observations  are  taken  at  the  points  x,  far 
i  »  1,2 . n  such  that  |  n, »  N.  Such  an  ex¬ 

periment  can  be  described  by  a  measure  £(jV]  as 
follows: 


{[*]<*> 


/!,;  if  x  =  x,e  x„) 

0;  otherwise 


Let  SfJIA'))*  {jc, . .<„)  denote  the  support  of 

£[iV|.  Note  that  if  £,v  =  £(N]/fV,  then  £iV  is  a 
discrete  probability  measure  on  £  Thus,  an  exact 
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or  discrete  experimental  design  is  a  probability 
measure  on  the  design  space  x  subject  to  the 
restriction  that  N^(x)  is  an  integer. 

Removing  the  restriction  that  |y(x)  be  a  multi¬ 
ple  of  l/N,  the  set  of  approximate  experimental 
designs  on  x  *s  denoted  by 


spcct  to  the  discrete  probability  measure  £N.  In 
general,  the  normalized  information  matrix  of  an 
experimental  design  £  is 

M(|,  B)=(o~2(x,  X)/(x,  b)fT(x,  B)  d£(x) 


■|fl /  d|(x)-l,{(x)iO, 

■«} 


for  every  x€} 


An  (approximate)  design  problem,  specified  by 
the  triplet  (17,  o2,  x)»  1$  solved  by  selection  of  an 
approximate  design  £  €  E  for  the  model  17,  the 
design  space  x  and  the  error  function  a1.  Note 
that  in  many  design  problems  an  exact  design 
can  be  approximated  by  an  approximate  design  £. 


2.2  Measures  of  optimality 


We  assume  that  least  squares  estimates  9  of  the 
parameter  9  are  to  be  obtained  Let  /(*,  9)  ° 
Orj(x,  $)/d 9  and 


F(0) 


tr(L  o) 


Then  for  these  estimates  (with  a  1),  the  asymp¬ 
totic  covariance  is  given  by 


The  dispersion  matrix  M-1  (£,  9)  is  sometimes 
written  D(£,  9). 

Many  criteria  have  been  proposed  for  optimiz¬ 
ing  the  selection  of  a  design  £  for  the  design 
problem  ( rj ,  o2,  x)*  Generally,  the  criteria  are 
based  on  some  functional  of  the  information  ma¬ 
trix,  M(£,  9).  Motivation  for  such  criteria  is  often 
based  on  the  properties  of  the  resulting  least 
squares  estimate  9.  For  example,  a  design  £D  is 
defined  to  be  D-optimal  for  (77,  o2,  x)  and  prior 
estimate  90  if 

max  |M(£,  0O)  |  =*  |M(£D,  0O)  | 

By  definition,  D-optimal  designs  minimize  the 
(asymptotic)  generalized  variance  of  the  least 
squares  estimate  of  9. 

Alternatively,  suppose  that  an  experimenter  is 
concerned  with  prediction.  The  least  squares 
estimate  of  the  mean  response  at  a  point  x  is 

.?(*)  ■=!?(*,  B) 

Var(_f(  x))  =  Var(rj(x,  B)) 

*fT(x,  <?>($,  6)f(x,  6) 


Var(«)  =  (F(»')rV-,F(fl)]’1 

where  V  =  diag  {o2(x„  X),  ,,o2(x„,  X)}.  For 
linear  models,  the  so-called  design  matrix,  X“ 
F(0),  is  independent  of  9  For  any  Af-poinl  dis¬ 
crete  design  £*,  wc  have 

F(S)T\-'T(S) 

=  N  £  a  ~i(x,\)f(x,0)/T(x,6)iN(x) 
*eS(U) 

“A rjc-2(x,  X)/(x,  0)/r(x,  6)  d{„(*) 

X 

and  hence  the  /,,/th  element  of  F(^)rV“,F(^)/^V 
is  o~2(x,  X  )/,(*,  9)fj(x%  9),  averaged  with  re- 


»d(*,£  £) 

G-optimal  designs  minimize  the  maximum  nor¬ 
malized  variance  of  prediction  o~2(x,  X)d(x, 
£,  9).  Formally,  a  design  £  *  is  G-optimal  if 

nun  max  o_2(.v,  X)  d(x,  £,  90) 

tez  xGx 

«=  mina’2(x,  X)  d(x,  £*.  90) 
xOX 

The  D-efficicncy  of  a  design  £  for  (17,  o2,  x)  and 
prior  estimate  00.  with  respect  to  £,,  is 

D({,  {,.  (l.  x)) 

“  (del  Bo)  del  M({,  0o)}'/F 
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Similarly,  the  G-efficiency  of  a  design  £  for 
(ij,  a2,  x)  and  prior  value  #0,  with  respect  to  £„  is 

G(|,  £„  (t).  o2,  x)) 

*  max  d(x,  #0)/max  d(x,  £,  0O) 
xex  tGX 

The  following  result,  given  by  Kiefer  and  Wolfo- 
witz  |30]  in  the  context  linear  models  and  later  [31] 
in  the  context  of  nonlinear  models,  shows  that  D- 
and  G-optimal  designs  are  equivalent. 

THEOREM  1.  The  following  conditions  are 
equivalent: 

(a)  is  D-optimal 

(b)  £*  is  G-optimal 

(c)  maxa~2(x,  A)  d(x,  c,  Q0)  * p. 

XGX 

The  set  of  all  designs  satisfying  these  conditions  is 
convex,  and  the  corresponding  information 
matrices  are  identical. 

The  equivalence  of  conditions  (a)  and  (c)  yields 
a  simple  method  for  checking  the  optimality  of  a 
candidate  design  £.  If  the  maximum  normalized 
prediction  variance  is  greater  than  />,  then  (  is  not 
D-(G-)optimaI.  Numerical  algorithms  {32}  for  con¬ 
structing  D-(G-)optimal  designs  make  direct  use 
of  this  condition. 

We  note  that  in  practice  o2(x,  A)  is  usually 
assumed  constant.  The  impact  of  this  assumption 
can  be  illustrated  by  the  following  example.  Sup¬ 
pose  rj(x,  0)  =  /r(x)0,  where  /r(x)  =  (l,  x,  \2) 
and  x“{-Ml  Suppose  also  that  o2(x,  A) 
=  ]{(A  -  l)v  +  A  +  !)]  for  A  £  1.  Thus,  the  error 
variance  increases  linearly  with  slope  (A  -  1)  over 
the  design  space  x  and  if  A  =  l,  o2(x,  A)«l. 
Table  1  shows  D-(G-)optimal  designs  for  various 
As.  Note  that  as  the  value  of  A  increases,  the 

TABLE  1 


Location  of  interior  points  (xx)  of  G-optimal  designs  ($x)  for 
quadratic  regression  for  various  X,  oJ(x,  X) 


X 

Interior  point  xx 

G- Efficiency  of  £, 

l 

0 

1000 

3 

-0141191 

0  958 

5 

-OJS3268 

0  924 

7 

—0  221089 

0902 

9 

-0  241081 

0888 

design  shifts  the  middle  support  point  toward  the 
left  side  of  the  design  space.  Surprisingly,  the 
D-optimal  design  shifts  mass  toward  lower  vari¬ 
ance  (high  efficiency)  region  of  the  design  space. 
This  pattern  has  consistently  appeared  in  worked 
examples.  For  further  results  see  refs.  32  and  33. 
We  note  from  the  table  that  the  G-efficiences  are 
monotonically  decreasing  in  A.  For  example,  with 
A  =  9,  the  G-efficiency  of  £,  is  0  888.  This  very 
simple  example  illustrates  the  nonrobustness  of 
the  usual  optimal  design  and,  we  think,  motivates 
the  need  for  the  study  of  designs  which  are  robust 
to  misspecification  of  a2  In  the  following  section 
we  introduce  the  concept  of  error-robustness  and 
develop  methods  for  constructing  robust  designs. 


3  ERROR-ROBUST  DESIGN 

The  concept  of  an  error  function  is  critical  in 
both  design  and  analysis  In  data  analysis  con¬ 
texts,  graphical  examination  of  scatlerplots  of  re¬ 
siduals  versus  predictors  or  fitted  values  is  used  to 
detect  nonconstant  variance.  A  systematic  mega¬ 
phone  shape  in  the  plot  would  indicate  that  the 
variance  of  the  response  depends  on  the  quantity 
plotted  on  the  x-axis  Cook  and  Weisberg  [34] 
suggested  an  alternative  approach  for  diagnosing 
non-constant  error  terms.  It  involves  expansion  of 
the  regression  model  by  assuming  a  particular, 
though  widely  applicable,  functional  form  for  the 
variance: 

var(x(x))  cc  exp(Arx) 

where  A  is  an  unknown  parameter  vector.  Cook 
and  Weisberg  utilized  this  difinition  to  propose 
the  score  test  and  the  equivalent  graphical  method 
for  testing  the  assumption  of  constant  error  terms 
in  linear  regression.  Many  of  the  error  functions 
commonly  encountered  in  data  analysis  arise  as 
special  cases  of  this  important,  general  form.  Sup¬ 
pose  we  expand  var(>»(x)) «  exp(Arx)  about  x  ®  0 
in  a  single  dimension.  Then 

var(>>(x))  cc  1  +  Ax  +  \2x2/2 

and  we  specify  o2(x)  as  proportional  to  a 
quadratic  function  of  x.  Specifying  only  the  first 
term  implies  that  c2(x)  is  proportional  to  x,  which 
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may  be  a  very  natural  assumption  in  a  compara¬ 
tively  narrow  range. 

The  results  of  Section  2  indicate  that  optimal 
designs  depend  on  the  model  specification  17  as 
well  as  the  variance  function  a2.  As  stated  previ¬ 
ously,  it  is  typically  the  case  in  practice  that  the 
variance  function  o2( x,  X)  cannot  be  determined 
before  experimentation.  Given  that  the  true  error 
function  o2(x,  X)  is  unknown  we  will  consider  a 
design  £  to  be  robust  to  specification  of  cr2(x,  X) 
if  |  is  highly  efficient  for  error  functions  likely  to 
be  encountered  m  practice.  More  specifically  we 
shall  assume  that  a2  is  an  unknown  element  of 
some  known  space  of  error  functions,  E.  We  will 
then  attempt  to  characterize  designs  that  are  effi¬ 
cient,  in  a  sense  to  be  described,  for  all  possible 
o2eE.  To  do  so,  we  shall  require  the  following 
result,  due  to  Atwood  (35],  which  relates  the  D 
and  G  efficiencies  of  a  design 


tion  1  indicates  that  a  design  is  error-robust  design 
if  and  only  if 

min  max  maxa_2(x)  d(x,  £,  60) 

o2«3E  xGX 

m  max  maxo“2(x)  d(x,  £*,  0O) 

o2«E  xGX 

Thus  the  error-robust  design  minimizes  the  ‘  worst 
case’  normalized  maximum  variance  of  fitted  val¬ 
ues. 

In  most  instances,  analytic  characterization  of 
the  error-robust  design  is  impossible,  and  numeri¬ 
cal  methods  are  required  See  Kim  [331  for  some 
notable  exceptions.  The  following  algorithm,  which 
is  a  simple  modification  of  one  by  Fedorov  [32], 
can  be  used  for  computer  construction  of  error- 
robust  designs. 

Algorithm  l 


THEOREM  2  Let  £„j  be  the  D-optimal  design 
for  (17,  a2,  x)*  Then  for  any  design  £  in  E, 

D(i,  oJ,  x))  £  G(s.  i.j.(i),  a1,  X» 

G-efficiency  provides  a  lower  bound  for  the  D-ef- 
ficiency  of  a  design  £  with  respect  to  the  D-opti- 
mal  design  £„i.  Following  Thibodeau  [8],  in  con¬ 
text  of  model  robustness,  we  attempt  to  construct 
designs  having  high  D-efficicncy  for  each  o2e  E 
by  maximizing  the  lower  bound.  Loosely  speaking, 
we  will  consider  a  design  error-robust  if  its  G-ef- 
ficiency  is  high  for  every  o2  e  E.  Thus  no  matter 
what  the  subsequent  analysis  indicates  regarding 
choices  of  o2,  the  D-efficiency  of  the  design  will 
be  relatively  high.  Formally,  we  have 


Definition  1.  The  design  £*  e  E  is  error-robust  if 
and  only  if 

max  min  G(£,  £„i,(tj,  o2,  x)) 

olOE 

=  minG(£*,  oJ,  x)) 

0J«JE 

Notice  that  because  the  number  of  parameters 
in  the  model,  pt  does  not  change  with  o2,  Defim- 


1.  Spec»fy  nonsingular  starting  design  ^  Set  1  =  1. 

2  Find  x,  such  that  max  max  a~2(x)  d(x,  £,, 

,  0JGE  <6x 

3.  Let  ay  =  1  /(i  +  s),  s  ^  0,  and  form  £(+1 5=1  (1  - 
«,)£,  + a, £v  where  £x<  places  unit  mass  at  x, 
Update  D. 

4  Check  for  convergence.  One  simple  approach  is 
as  follows.  Assume  k  £  2  is  a  user  defined 
integer.  Typically,  k  »  5.  Let 


1)  0o) 

1  <,j  <  min(/,A') 

Let  be  the  sample  variance  of  the  {8t}.  If 
1  £  k  and  sj  is  sufficiently  small,  stop. 
Otherwise,  set  1 « 1  -f  1  and  go  to  2. 


Note  that  the  sequence  { a, },  as  specified  above, 
will  not,  in  general,  lead  to  monotonically  decreas- 
mg  o-J(x,)d(x„  {„  0„) 


As  a  simple  illustration,  consider  again  the 
quadratic  regression  model  /r(x)  -  (1,  x,  x2) 
with  E»  {o2(x)}o2(x)a(X- l)x  +  (X  +  l),  X 
«  1,  3, 5, 7,  9).  The  following  design  was  found  to 
be  error-robust  using  the  algorithm  described 
above:  £(±  1)  -  0.325,  £(0.039609)  =  0.182, 
£(—0.260323)  =  0.167.  Table  2  presents  G-ef- 
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TABLE  2 


G-cffiacncies  of  various  designs  £*  for  quadratic  regression  on 
x  -  i- ui  «:(*> « <x  -  u* +<* + 1> 


Design 

Actual  y 

1 

3 

5 

7 

9 

ii 

10 

0.958 

0924 

0902 

0  888 

(, 

0948 

10 

0996 

0998 

0.994 

0914 

0993 

10 

099S 

0994 

{, 

0876 

0979 

0997 

10 

0999 

f. 

0  854 

0969 

0992 

099S 

1.0 

Robust 

0  974 

09SI 

0979 

0978 

0974 

ficienctes  of  designs  constructed  under  varying 
assumptions  about  A  For  example,  the  first  row 
summarizes  the  performance  of  the  optimal  design 
under  assumption  A » 1,  for  vanous  alternative 
4  true’  efficiency  functions  As  noted  previously,  if 
A  turns  out  to  be  9  by  subsequent  analysis,  the 
design  will  be  88  8%  G-efficient.  The  worst  case 
occurs  in  the  lower  left-hand  corner  of  the  table. 
Here  the  experimenter  has  assumed  A  =  9,  when  A 
turns  out  to  be  1,  in  which  case  the  G-efficiency  of 
the  D-optimal  design  is  85.4%.  In  contrast,  the 
worst-case  G-efficiency  of  the  error-robust  design 
is  97.4%  Interestingly,  the  error-robust  design 
consists  of  4  support  points.  Intuitively,  mass  at 
x  =  0  039609  was  required  to  protect  against  A  =  1 
where  mass  x  =  -0.260323  was  required  for  pro¬ 
tection  against  A  =  9.  This  intuitive  explanation  is 
supported  by  the  fact  that  dunng  execution  of  the 
computer  algorithm,  maximization  of  o~2(x)  d(x, 
£,  0o)  occurred  only  at  A  =  1  or  A  =»  9.  This  exam¬ 
ple  suggests  that  for  A  =  [a  ,6),  in  some  cases  a 
reasonable  approximation  to  the  error-robust  de¬ 
sign  will  be  obtained  by  mixing  the  D-optimal 
des  gns  and  appropriately. 


4  POWER-TRANSFORMATION  ROBUST  DESIGN 

Recently  Carroll  and  Ruppert  (29)  introduced  a 
method,  power  transformation  on  both  sides, 
PTBS,  for  simultaneous  estimation  of  regression 
parameters  and  an  appropriate  power  transforma¬ 
tion  index.  They  discussed  its  use  with  known, 
nonlinear  regression  models.  Suppose  the  known 


mean  structure,  which  may  be  derived,  for  exam¬ 
ple,  from  a  physical  system,  is  £( >>(x))  =  tj(x,  0) 
and  that  tj(x,  6)  >  0  for  x  e  x-  Errors  (t)  are  not 
necessarily  additive  (or  constant  over  x)  implying 

y(x)=g(r,(x,  e),c) 

For  example,  if  the  errors  are  log  normal  and 
g(a,  b)<*ab  (i  e.,  errors  are  multiplicative),  taking 
logs  yields 

logM*))  “  1°S  *)(*»  0)  +  e 

Where  (c)  are  normally  distributed.  This  type  of 
situation  led  Carroll  and  Ruppert  to  consider  a 
family  of  strictly  monotonic  transformations 
h(y.  A),  indexed  by  the  <7- vector  A,  and  to  assume 
that  for  some  value  of  A,  say  A0, 

h(y,  K)  =  /‘(v(x,  o),K)  +  < 

This  approach  is  in  the  spirit  of  Box  and  Cox  [36], 
who  suggested  the  well  known  power  transforma¬ 
tion  family: 

h(y9  A)  (yx-  l)/A  if  A  =  0 

=  log  (>>)  if  A^O 

Box  and  Cox  sought  a  transformation  that  achieves 
(a)  a  simple  additive  or  linear  model,  (b)  homo- 
scedastic  errors,  and  (c)  normally  distributed  er¬ 
rors  In  PTBS  regression,  both  the  response  and 
the  model  are  transformed  via  h.  An  important 
advantage  of  PTBS  regression  is  that  the  original 
meaning  of  the  parameters  is  preserved  Estima¬ 
tion  of  6  and  A  in  PTBS  regression  is  typically 
earned  out  via  normal  theory  maximum  likeli¬ 
hood. 

For  the  above  model,  we  have 


0) 

30 


-A(*.  e) 


~„(x,0)x-‘/(x.0) 

where  /(x,  $)  is  as  defined  previously:  /(x,  0)  “ 
9ij(x,  0)/30  Given  A,  information  matrix  and 
variance  functions  are  defined  as 


-  /  .,(*,  *)**-'>/(*.  e)fT(x,  0)  d«(x) 
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and 

dx(*. 

»)/(*>  0) 

respectively. 

The  above  expressions  indicate  that  the  design 
problem  may  be  viewed  as  standard,  with  induced 
efficiency  function  o~2(x,  A)«=>  »?(*»  ^)2<x_1>.  It  is 
also  apparent  that  choice  of  design  will  depend  on 
an  experimenter’s  a  priori  suspicions  concerning 
A.  Typically,  one  takes  A»  1  and  hopes  for  the 
best,  although  consequences  can  be  dire.  For  ex¬ 
ample,  suppose  that  the  underlying  theoretical 
model  is  quadratic  and  errors  arc  multiplicative 
and  log  normal.  That  i\  7j(x,  0) «  0,  +  02x  +  0yx2 
and  A  «  0  gives  the  appropriate  transformation. 
For  0or«(l,  1, 1)  and  x  “  (0,1J.  the  design  £0(±1) 
«  to{0  373) »  \  is  D-optimal.  On  the  other 
hand,  if  the  experimenter  assumes  X  «  1,  and  ob¬ 
vious  choice  might  be  the  usual  D-optimal  design 
£i.  which  places  {  mass  at  the  points  ±  1  and  0. 
Since  max,eX  d0(x.  $„  0o)**3. 56.  £y  is  84%  G- 
efficient.  If  the  appropriate  A  is  -1  or  2,  the 
G-efficiency  of  drops  to  47%  in  both  cases. 

The  above  discussion  motivates  the  need  for 
designs  for  PTBS  regression  that  are  robust  to 
specification  of  A  for  A  in  a  specified  set  L  We 
offer  the  following. 


Definition  2  The  design  £*  e  3!  is  power-trans¬ 
formation  (FT)  robust  if  and  only  if 

min  max  max  dx(x,  £)  »  max  max  dx(x,  t  *  ) 
xcx 

where  d,(x.  i)  h(x). 

As  noted,  for  a  specified  regression  function 
7)(x,  0),  the  Carroll  and  Ruppert  family  of  trans¬ 
formations  indexed  by  AGL  induces  a  corre¬ 
sponding  family  of  induced  error  function  Ex  “ 

Thus  Definition  2  may  be  restated  in  ihe  following 
way. 

Definition  3.  The  design  £  *  G  E  is  PT-robust  if 
and  only  if  £ *  is  error-robust  for  Ex. 


Since  PT-robustne$s  is  a  special  case  of  error- 
robustness,  the  algorithm  previously  developed  for 
computer  construction  of  robust  designs  is  appli¬ 
cable. 


5  TRANSFORMATION  ROBUSTNESS  APPLICATIONS 

The  following  two  examples  are  taken  from 
literature  and  are  frequently  cited  in  papers  on 
nonlinear  design.  These  examples  illustrate  how 
inefficient  the  usual  D-optimal  designs  can  be  in 
the  presence  of  uncertainty  about  the  error  struc¬ 
ture,  and  the  efficacy  of  the  robust  approach. 


Example  /.  The  following  experiment  was  re¬ 
ported  by  Box  and  Hunter  (20)  and  has  been 
discussed  by  numerous  authors  The  purpose  of 
the  experiment  is  to  model  some  catalytic  reac¬ 
tions  of  the  type  R  P,  +  P,  m  which  the  reagent 
R  is  some  quaternary  or  primary  alcohol  from  a 
log  chain,  the  product  P,  is  an  olefin  and  the 
product  p  is  water.  The  theoretical  model  for  such 
a  reaction  is 


nix,  0) 


0>0>Xi 

1  +  0,X,  +  02x2 


where  rj  is  the  speed  of  the  chemical  reaction,  x, 
is  the  partial  pressure  of  the  product  P,  x2  is  the 
partial  pressure  of  the  product  P,t  0,  is  a  reaction 
parameter,  02  is  the  absorption  equilibrium  con¬ 
stant  for  the  product  P,,  and  0,  is  the  effective 
constant  of  the  reagent  R. 

For  purposes  of  design  construction,  following 
Box  and  Hunter  (20),  the  prior  values  of  the 
parameters  were  fixed  at  Of  ™  (2.9,12.2,6.9).  It  was 
assumed  that  observations  are  possible  in  the  re¬ 
gion  X“  {*i.  *2|0£x,  £  3,  0^*2  £3),  which 
leads  to  the  locally  D-optimal  design,  £D(0.3, 0.0) 
*»  £n(3-0.  0.0)  *  $ i>(3.0, 0.8) »  1/3.  and  x  arc 

pictured  in  Fig.  la.  The  fact  that  the  design  does 
not  cover  the  design  space  leads  one  to  question 
the  logic  of  the  design,  unless  the  experimenter 
has  particularly  strong  faith  in  his  assumptions. 
The  D-optimal  designs  for  A  **  - 1  and  A  «  0  are 
pictured  in  Figs,  lb  and  1c,  respectively.  Note  that 
for  A  <  1  the  efficiency  function  is  undefined  at 
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the  previous  example  the  error-robust  design  seems 
to  represent  the  best  trade-off  possible  between 
D-optimal  d  igns  for  A  »  ±1.  Table  3  also  shows 
that  this  66%  G-efficiency  is  high  in  comparison  to 
the  «0%  G-efficiency  resulting  from  the  case 
when  we  assume  X  =  0,  and  X  turns  out  to  be  1. 

o  3  o  3 


(a) 


(b) 


<0  (<D 

Fig  !  Optimal  designs  for  Example  1  x  “  I0.3J*  (a)  Optimal 
design  for  A  - 1  €(0  3. 0)  -  f  (3. 0)  -  €(3. 0  8)  - 1/3  (b)  Opti- 
mal  design  for  A--1.  €(01.0)-€(0 1.3)  -1/3.  €(3.0)- 
€<3. 3)  - 1/6  (c)  Optimal  design  for  A  -  0  € (0 1. 0)  -  £<3. 0) 
-  f(0 1. 3)  - 1/3  (d)  Robust  design  {(3. 0)  -  0  216;  €(3. 0  8) 
-017.  t(Q  3.0)  -0101.  €(01. 3) -0  227.  €(0 1.0) -0  212. 
€(3.1  7)  -0073 


x,  »  0.  Thus  it  was  necessary  to  truncate  the  de¬ 
sign  space  such  that  x  m  {(*»«  at2)  |  A  <  x,  ^  3,  0 
£  x2  <,  3}  for  some  A  >  0.  We  chose  A  =  0.1.  The 
truncation  is  indicated  in  Figs,  lb  and  lc. 

The  PT-robust  design  is  pictured  in  Fig.  Id. 
G-cfficicncies  of  the  robust  design  for  various  true 
X  arc  summarized  in  Table  3.  Notice  that  the 
worst  case  G-efficiencies  result  for  A  =  ±  1  with 
both  values  being  about  66%.  As  was  indicated  in 


TABLE  3 

G-cfficienaes  for  designs  in  Example  1 


A  Assumed 

A 

-1 

0 

1 

-1 

1000 

0520 

0  220 

0 

0997 

1.000 

0000 

1 

0000 

0214 

1000 

Robust 

0656 

0  754 

0662 

Example  2  The  following  model  was  studied  by 
Carr  (37) 


V(x,  0) 


eA{xj-x,/\.6M) 

1  -F  02Xj  +  03x2  -F  04x3 


where  ij  is  the  rale  of  disappearance  of  M-pentane, 
x,,  x2,  xy  are  the  partial  pressures  of  hydrogen, 
//•pentane  and  /-pentane  respectively,  0,  is  a  reac¬ 
tion  parameter  and  02,  03,  04  are  equilibrium 
constants  (psia-1).  For  this  problem,  x  “ 
((*„,  x2.  x3)|107^Xj^471,  69  ^  x2  ^  294;  11 
£x3^  121}.  Box  and  Hill  (38)  later  used  power 
transformation  weighting  to  fit  the  model  to  Carr’s 


v. 


107  *71  107  471 

CO  CO 


Fig.  2  Optimal  designs  for  Example  2  (a)  Optimal  design  for 
A  -  0  €(107.  294.  II)  -  €(471.  69.  U)  -  €(107.  69.  11)  - 
€(107. 125  5. 121)  -  0  25  (b)  Optimal  design  for  A  «  0  5  €(107. 
294.  II) -€(471.  294.  11) -€(107.  69.  11) -€(107,  294. 
121)  —  0  25  (c)  Optimal  design  for  A  — Is  €(107.294.11)- 
€(3*0.  294, 11) -€(107.  125  25.  11)  -€(107.  294,93  5) -0  25 
(di  bust  design-  €(107.294. 11) -0 189,  €(107.125  25.11) 
-0074.  €007.  294.  93  5) -0116.  €<380.  294.  U)  -  0 181. 
£(107.69,93  5)  — 0099.  €(471.69. 11)  -0118;  €(107.69.11) 
-  0 142.  €007. 294, 121)  -  0  082. 
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TABLE  4 

G-efficiencies  for  designs  in  Example  2 


X  Assumed 

X 

0 

1/2 

1 

0 

1000 

0350 

0094 

1/2 

0  536 

1000 

0  822 

1 

0304 

0663 

1000 

Robust 

0  768 

0767 

0768 

24  observations.  We  consider  the  construction  of  a 
PT-robust  design. 

Carroll  and  Ruppert  obtained  the  PTBS 
parameters  estimates  ( 0 ,  X)  =  <39.2,  0.043,  0021, 
0  104,  0.72).  These  point  estimates  for  6  are  used 
as  prior  values  in  what  follows.  Reasonable  values 
of  A  were  thought  to  be  in  the  interval  L  =  (0,1}. 
Computational  constraints  forced  us  to  rather 
severly  discretize  both  L  and  x*  We  t00^  L  = 
(0.0.5.1.0)  and  to  approximate  x>  wc  used  as  a 
candidate  set  corresponding  to  the  5 3  factorial 
region  The  error-robust  design  is  pictured  in  Fig 
2d  For  reference,  the  D-optimal  designs  for  A  •  0. 
1/2,  and  1  are  pictured  in  Figs  2a-c.  Table  4 
gives  G-efficiencies  for  varying  designs  and  as¬ 
sumptions  about  A  For  example,  a  G-efficiency 
of  30.4%  occurs  when  A  «  0  and  constant  variance 
is  assumed.  In  contrast,  the  minimum  G-cfficiency 
for  the  error-robust  design  is  76%. 


6  CONCLUSIONS 

In  this  paper  we  have  summarized  research 
directed  toward  the  characterization  of  designs 
that  are  insensitive  to  the  specification  of  error 
structure.  We  have  developed  the  related  concepts 
of  error  and  transformation  robustness  and  ex¬ 
amined  a  number  of  designs  that  were  approxi¬ 
mately  optima!  by  our  stated  criterion.  Some  obvi¬ 
ous  extensions,  however,  are  still  needed.  While 
the  designs  calculated  arc  reasonably  robust  to  the 
specification  of  error  structure  in  the  nonlinear 
case  they  suffer  from  the  need  to  specify  0  a  prior. 
One  way  of  alleviating  this  difficulty  may  be  to 
combine  the  maximum  approach  suggested  herein 
with  the  methods  of  Bayesian  nonlinear  design  as 


described  in  ref  18.  Such  methods  are  currently 
under  investigation 
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